Fix causal_lm cpp demo for llama architecture #71

sammysun0711 · 2023-12-13T12:31:49Z

This PR aim to fix causal_lm cpp demo for llama architecture

Change directory name from casual_lm to causal_lm
Fix causal lm cpp demo input & output mismatch for llama architecture
- llama does not require position ids as input, only chatglm required
- llama past kv cache as input starting from idx 2 instead of 3

This is a initial step to build baseline for consolidating following model support into causal_lm cpp demo:

Qwen: Add Qwen CPP Pipeline #43
ChatGLM: Wenyi5608 chatglm #48

sammysun0711 · 2023-12-13T12:33:44Z

@Wovchena , @ilya-lavrenov, could you please review it?

text_generation/causal_lm/cpp/set_up_and_run.sh

text_generation/causal_lm/cpp/causal_lm.cpp

ilya-lavrenov · 2023-12-13T12:54:09Z

text_generation/causal_lm/cpp/causal_lm.cpp

        }}
    };
    std::vector<ov::Output<ov::Node>> inputs = model->inputs();
-    for (size_t idx = 3; idx < inputs.size(); ++idx) {
+    for (size_t idx = 2; idx < inputs.size(); ++idx) {
        ov::PartialShape shape = inputs.at(idx).get_partial_shape();
        shape[0] = BATCH_SIZE;


as far as I remember, chatglm has different index for batch dimension
https://github.com/openvinotoolkit/openvino.genai/pull/48/files#diff-bceea0d0f5f31cbb280e385123d098ecddc2ee4bcc57a87b6789bce72a8b481cR85

Yes, this PR is only for llama, will add chatglm support in another PR.

Apparently exporting a model as stateful removes any difference between the dimensions. I was able to run chatglm2-6b and chatglm3-6b using Wovchena#8 and get meaningful output. Here's the PR updating convert.py: #52. My next step is to rewrite this causal_lm as stateful as well, so we could decide if we need stateless approach at all.

I agree.
But we should wait until stateful models support is merged to HF, right?
It would be great to avoid when we refer to different commits to convert different models

@Wovchena pls consider we may still need stateless models until both CPU and GPU plugin support stateful model.

We are going to have stateful support for both CPU and GPU.

We are considering to have separate samples for stateless models if they demonstrate popular architectures and require architecture-dependent kv-cache processing.

But the main focus is on stateful models.

so, maybe we can merge chatglm sample as is and treat it as "sample for stateless model for popular architecture" ?
While current one will show stateful models and hence it become more generic

Greedy search with stateful model: sammysun0711#1 That eliminates a need to have special implementation for chatglm.

I also had that Idea to have stateless model with architecture, which doesn't fit into the general scenario. And there's still Qwen which may require that #43. I haven't tried it yet

sammysun0711 · 2023-12-14T01:05:23Z

@Wovchena, @ilya-lavrenov , @slyalin thanks for your feedback. I've learn a lot from it.
Based on discussion above, I think I will change this PR scope with only following changes:

Fix directory name typo
Upgrade OpenVINO 23.2 in https://github.com/openvinotoolkit/openvino.genai/blob/master/text_generation/casual_lm/cpp/set_up_and_run.sh for CI

As @ilya-lavrenov suggested, we can keep causal_lm sample for generic stateful model enabling.
Meanwhile, as @slyalin suggested, we can add ChatGLM3 and Qwen as stateless model sample that demonstrate popular architectures and require architecture-dependent kv-cache processing.

Qwen: Add Qwen CPP Pipeline #43 (Will update it with OV tokenizer)
ChatGLM: Wenyi5608 chatglm #48

Please feel free to share any comments and feedback, thanks!

.github/workflows/casual_lm_cpp.yml

sammysun0711 added 2 commits December 13, 2023 20:18

Fix directory typo and fix causal_lm demo for llama architecture

c034679

Fix format

b289e77

Wovchena requested changes Dec 13, 2023

View reviewed changes

text_generation/causal_lm/cpp/set_up_and_run.sh Show resolved Hide resolved

text_generation/causal_lm/cpp/set_up_and_run.sh Outdated Show resolved Hide resolved

text_generation/causal_lm/cpp/causal_lm.cpp Outdated Show resolved Hide resolved

sammysun0711 added 2 commits December 13, 2023 20:43

Fix typo

6c5e28f

Fix CI for causal lm cpp

8e67c5f

ilya-lavrenov reviewed Dec 13, 2023

View reviewed changes

sammysun0711 added 3 commits December 13, 2023 22:39

Add position id support for updated optimum

54d38b5

WA for CI timeout with unstoped sentence generation

6659243

Merge branch 'master' into fix-causal-lm-for-llama

fd979e5

sammysun0711 requested a review from Wovchena December 14, 2023 06:16

ilya-lavrenov reviewed Dec 14, 2023

View reviewed changes

.github/workflows/casual_lm_cpp.yml Outdated Show resolved Hide resolved

Rename causal_lm workflow file

3475e00

ilya-lavrenov approved these changes Dec 14, 2023

View reviewed changes

ilya-lavrenov merged commit 424c46c into openvinotoolkit:master Dec 14, 2023
1 check passed

peterchen-intel added the category: llm_bench Label for tool/llm_bench folder label Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix causal_lm cpp demo for llama architecture #71

Fix causal_lm cpp demo for llama architecture #71

sammysun0711 commented Dec 13, 2023

sammysun0711 commented Dec 13, 2023

ilya-lavrenov Dec 13, 2023 •

edited

Loading

sammysun0711 Dec 13, 2023

Wovchena Dec 13, 2023 •

edited

Loading

ilya-lavrenov Dec 13, 2023

sammysun0711 Dec 13, 2023

slyalin Dec 13, 2023 •

edited

Loading

slyalin Dec 13, 2023

slyalin Dec 13, 2023

ilya-lavrenov Dec 13, 2023

Wovchena Dec 13, 2023

sammysun0711 commented Dec 14, 2023 •

edited

Loading

Fix causal_lm cpp demo for llama architecture #71

Fix causal_lm cpp demo for llama architecture #71

Conversation

sammysun0711 commented Dec 13, 2023

sammysun0711 commented Dec 13, 2023

ilya-lavrenov Dec 13, 2023 • edited Loading

Choose a reason for hiding this comment

sammysun0711 Dec 13, 2023

Choose a reason for hiding this comment

Wovchena Dec 13, 2023 • edited Loading

Choose a reason for hiding this comment

ilya-lavrenov Dec 13, 2023

Choose a reason for hiding this comment

sammysun0711 Dec 13, 2023

Choose a reason for hiding this comment

slyalin Dec 13, 2023 • edited Loading

Choose a reason for hiding this comment

slyalin Dec 13, 2023

Choose a reason for hiding this comment

slyalin Dec 13, 2023

Choose a reason for hiding this comment

ilya-lavrenov Dec 13, 2023

Choose a reason for hiding this comment

Wovchena Dec 13, 2023

Choose a reason for hiding this comment

sammysun0711 commented Dec 14, 2023 • edited Loading

ilya-lavrenov Dec 13, 2023 •

edited

Loading

Wovchena Dec 13, 2023 •

edited

Loading

slyalin Dec 13, 2023 •

edited

Loading

sammysun0711 commented Dec 14, 2023 •

edited

Loading