Skip to content

Commit

Permalink
Add fp16 support for split cache
Browse files Browse the repository at this point in the history
  • Loading branch information
PatriceVignola committed Dec 16, 2023
1 parent 7376c6a commit 06b41bc
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion optimum/onnxruntime/modeling_decoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ def __init__(

self.use_fp16 = False
for inp in model.get_inputs():
if inp.name == "past_key_values" and inp.type == "tensor(float16)":
if (inp.name == "past_key_values" or inp.name in self.key_value_input_names) and inp.type == "tensor(float16)":
self.use_fp16 = True
break

Expand Down

0 comments on commit 06b41bc

Please sign in to comment.