Skip to content

Commit

Permalink
[GPU] model cache fix from kv cache compression (openvinotoolkit#27323)
Browse files Browse the repository at this point in the history
### Details:
 - model cache was not working because of load/save mismatch
  • Loading branch information
isanghao authored Oct 30, 2024
1 parent 9036b59 commit 8da8a30
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -116,14 +116,14 @@ struct scaled_dot_product_attention : public primitive_base<scaled_dot_product_a
void save(BinaryOutputBuffer& ob) const override {
primitive_base<scaled_dot_product_attention>::save(ob);
ob << is_causal;
ob << is_kv_compressed;
ob << has_attn_mask_input;
ob << has_scale_input;
ob << indirect_axis;
ob << input_q_transpose_order;
ob << input_k_transpose_order;
ob << input_v_transpose_order;
ob << output_transpose_order;
ob << is_kv_compressed;
ob << make_data(&quantization_attributes.quantization_type, sizeof(quantization_attributes.quantization_type));
ob << make_data(&quantization_attributes.quantization_dt, sizeof(quantization_attributes.quantization_dt));
ob << make_data(&quantization_attributes.scale_dt, sizeof(quantization_attributes.scale_dt));
Expand Down

0 comments on commit 8da8a30

Please sign in to comment.