-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accept buffer in LLMPipeline ctor #1262
Accept buffer in LLMPipeline ctor #1262
Conversation
413c015
to
2cdf0d3
Compare
ov::Core core; | ||
const char* ov_tokenizer_path = getenv(ScopedVar::ENVIRONMENT_VARIABLE_NAME); | ||
OPENVINO_ASSERT(ov_tokenizer_path, "openvino_tokenizers path is not set"); | ||
core.add_extension(ov_tokenizer_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that default singleton from utils.hpp can include tokenizer extension to avoid duplication of logic of multiples Cores creation.
// This is necessary only for NPU, for other plugins can be ommited. | ||
ov::AnyMap model_descr_properties = {{"name_or_path", "meta-llama/Llama-2-7b-chat-hf"}, | ||
{"type", "llama"}, | ||
{"num_key_value_heads", 32}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can avoid demonstrating such NPU hacks in samples?
I hope that @dmatveev and @TolyaTalamanov can come up with more generic approach which does not require reading this information from config.json
. E.g. num_key_value_heads
can be extracted from model itself
openvino.genai/src/cpp/src/utils/paged_attention_transformations.cpp
Lines 23 to 25 in e2fa0d0
ov::PartialShape k_shape = parameters[kv_caches_inputs_offset]->get_partial_shape(); | |
OPENVINO_ASSERT(k_shape.rank().get_length() == 3, "KV cache shape is expected to have rank 3, while shape is ", k_shape); | |
size_t num_kv_heads = k_shape[1].get_length(), head_size = k_shape[2].get_length(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this from the sample. In order not to forget about that created a ticket for NPU
to search for a better solution.
@TolyaTalamanov @dmatveev
2d4496a
Ticket: CVS-158144, CVS-158142
- ~#1302 (didn't port this PR because of the issue CVS-159227) - #1262 - #1336 - #1331 --------- Co-authored-by: Andrei Kochin <[email protected]> Co-authored-by: Vladimir Zlobin <[email protected]> Co-authored-by: Ilya Lavrenov <[email protected]>
- ~openvinotoolkit#1302 (didn't port this PR because of the issue CVS-159227) - openvinotoolkit#1262 - openvinotoolkit#1336 - openvinotoolkit#1331 --------- Co-authored-by: Andrei Kochin <[email protected]> Co-authored-by: Vladimir Zlobin <[email protected]> Co-authored-by: Ilya Lavrenov <[email protected]>
- ~openvinotoolkit/openvino.genai#1302 (didn't port this PR because of the issue CVS-159227) - openvinotoolkit/openvino.genai#1262 - openvinotoolkit/openvino.genai#1336 - openvinotoolkit/openvino.genai#1331 --------- Co-authored-by: Andrei Kochin <[email protected]> Co-authored-by: Vladimir Zlobin <[email protected]> Co-authored-by: Ilya Lavrenov <[email protected]>
Ticket: CVS-158144, CVS-158142