Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sycl-exp : Revert "Minor arithmetic improvement to mmvq wrapper kernel (#7172)" #7980

Merged
merged 2 commits into from
Jun 18, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 9 additions & 11 deletions ggml-sycl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7984,26 +7984,24 @@ static void mul_mat_vec_q(const void * __restrict__ vx, const void * __restrict_
const int blocks_per_row = ncols / qk;
const int blocks_per_warp = vdr * WARP_SIZE / qi;

const int qi_vdr = (qi / vdr); // N_threads processing 1 qk block

// partial sum for each thread
// partial sum for each thread
joeatodd marked this conversation as resolved.
Show resolved Hide resolved
float tmp = 0.0f;

const block_q_t * x = (const block_q_t *) vx;
const block_q8_1 * y = (const block_q8_1 *) vy;

for (int i = item_ct1.get_local_id(2) / qi_vdr; i < blocks_per_row;
for (int i = item_ct1.get_local_id(2) / (qi / vdr); i < blocks_per_row;
i += blocks_per_warp) {
const int ibx = row * blocks_per_row + i; // x block index
const int ibx = row*blocks_per_row + i; // x block index

const int iby = i * (qk / QK8_1); // y block index that aligns with ibx
const int iby = i * (qk/QK8_1); // y block index that aligns with ibx

const int iqs =
vdr *
(item_ct1.get_local_id(2) -
i * qi_vdr); // x block quant index when casting the quants to int
const int iqs =
vdr *
(item_ct1.get_local_id(2) %
(qi / vdr)); // x block quant index when casting the quants to int

tmp += vec_dot_q_sycl(&x[ibx], &y[iby], iqs);
tmp += vec_dot_q_sycl(&x[ibx], &y[iby], iqs);
}

// sum up partial sums and write back result
Expand Down
Loading