-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix causal_lm cpp demo for llama architecture #71
Fix causal_lm cpp demo for llama architecture #71
Conversation
@Wovchena , @ilya-lavrenov, could you please review it? |
}} | ||
}; | ||
std::vector<ov::Output<ov::Node>> inputs = model->inputs(); | ||
for (size_t idx = 3; idx < inputs.size(); ++idx) { | ||
for (size_t idx = 2; idx < inputs.size(); ++idx) { | ||
ov::PartialShape shape = inputs.at(idx).get_partial_shape(); | ||
shape[0] = BATCH_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I remember, chatglm has different index for batch dimension
https://github.com/openvinotoolkit/openvino.genai/pull/48/files#diff-bceea0d0f5f31cbb280e385123d098ecddc2ee4bcc57a87b6789bce72a8b481cR85
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this PR is only for llama, will add chatglm support in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently exporting a model as stateful removes any difference between the dimensions. I was able to run chatglm2-6b and chatglm3-6b using Wovchena#8 and get meaningful output. Here's the PR updating convert.py: #52. My next step is to rewrite this causal_lm as stateful as well, so we could decide if we need stateless approach at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
But we should wait until stateful models support is merged to HF, right?
It would be great to avoid when we refer to different commits to convert different models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Wovchena pls consider we may still need stateless models until both CPU and GPU plugin support stateful model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to have stateful support for both CPU and GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are considering to have separate samples for stateless models if they demonstrate popular architectures and require architecture-dependent kv-cache processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the main focus is on stateful models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, maybe we can merge chatglm sample as is and treat it as "sample for stateless model for popular architecture" ?
While current one will show stateful models and hence it become more generic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greedy search with stateful model: sammysun0711#1 That eliminates a need to have special implementation for chatglm.
I also had that Idea to have stateless model with architecture, which doesn't fit into the general scenario. And there's still Qwen which may require that #43. I haven't tried it yet
@Wovchena, @ilya-lavrenov , @slyalin thanks for your feedback. I've learn a lot from it.
As @ilya-lavrenov suggested, we can keep causal_lm sample for generic stateful model enabling.
Please feel free to share any comments and feedback, thanks! |
This PR aim to fix causal_lm cpp demo for llama architecture
casual_lm
tocausal_lm
This is a initial step to build baseline for consolidating following model support into causal_lm cpp demo: