Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to run Llama-2-7b-chat-hf on NPU through Sample/Python #820

Open
aoke79 opened this issue Sep 4, 2024 · 8 comments
Open

failed to run Llama-2-7b-chat-hf on NPU through Sample/Python #820

aoke79 opened this issue Sep 4, 2024 · 8 comments
Assignees

Comments

@aoke79
Copy link

aoke79 commented Sep 4, 2024

Dears,
I failed to run Llama-2-7b-chat-hf on NPU, please give me a hand.

  1. I converted the mode by below command, and got two models,
    a) optimum-cli export openvino --task text-generation -m Meta--Llama-2-7b-chat-hf --weight-format int4_sym_g128 --ratio 1.0 ov--Llama-2-7b-chat-hf-int4-sym-g128
    b) optimum-cli export openvino --task text-generation -m Meta--Llama-2-7b-chat-hf --weight-format int4 ov--Llama-2-7b-chat-hf-int4
  2. I used chat_sample, benchmark_genai, beam_search_causal_lm, and got the similar results like:
    a) python beam_search_causal_lm.py c:\AIGC\hf\ov--Llama-2-7b-chat-hf-int4-sym-g128 "why the Sun is yellow?"
    b) python chat_sample.py c:\AIGC\hf\ov--Llama-2-7b-chat-hf-int4-sym-g128
    c) python benchmark_genai.py -m C:\AIGC\openvino\models\ov--Llama-2-7b-chat-hf-int4-sym-g128 -p "why the Sun is yellow?" -nw 1 -n 1 -mt 200 -d CPU

(env_ov_genai) c:\AIGC\openvino\openvino.genai\samples\python\beam_search_causal_lm>python beam_search_causal_lm.py c:\AIGC\hf\ov--Llama-2-7b-chat-hf-int4-sym-g128 "why the Sun is yellow?"
Traceback (most recent call last):
File "c:\AIGC\openvino\openvino.genai\samples\python\beam_search_causal_lm\beam_search_causal_lm.py", line 29, in
main()
File "c:\AIGC\openvino\openvino.genai\samples\python\beam_search_causal_lm\beam_search_causal_lm.py", line 24, in main
beams = pipe.generate(args.prompts, config)
RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:79:
Check '::getPort(port, name, {_impl->get_inputs(), _impl->get_outputs()})' failed at src\inference\src\cpp\infer_request.cpp:79:
Port for tensor name beam_idx was not found.

(env_ov_genai) c:\AIGC\openvino\openvino.genai\samples\python\benchmark_genai>python benchmark_genai.py -m c:\AIGC\openvino\models\TinyLlama-1.1B-Chat-v1.0\OV_FP16-4BIT_DEFAULT -p "why the Sun is yellow?" -nw 1 -n 1 -mt 200 -d NPU
Traceback (most recent call last):
File "c:\AIGC\openvino\openvino.genai\samples\python\benchmark_genai\benchmark_genai.py", line 49, in
main()
File "c:\AIGC\openvino\openvino.genai\samples\python\benchmark_genai\benchmark_genai.py", line 32, in main
pipe.generate(prompt, config)
RuntimeError: Exception from C:\Jenkins\workspace\private-ci\ie\build-windows-vs2019\b\repos\openvino.genai\src\cpp\src\llm_pipeline_static.cpp:206:
Currently only batch size=1 is supported

(env_ov_genai) c:\AIGC\openvino\openvino.genai\samples\python>python chat_sample.py c:\AIGC\hf\ov--Llama-2-7b-chat-hf-int4-sym-g128
Traceback (most recent call last):
File "c:\AIGC\openvino\openvino.genai\samples\python\chat_sample.py", line 43, in
main()
File "c:\AIGC\openvino\openvino.genai\samples\python\chat_sample.py", line 22, in main
pipe = openvino_genai.LLMPipeline(args.model_dir, device)
RuntimeError: Exception from src\core\src\pass\stateful_to_stateless.cpp:128:
Stateful models without beam_idx input are not supported in StatefulToStateless transformation

I'm not sure if I converted the correct model, so I generated two models like above command line, but neither of them worked.
might you please show me how to do that?
Thanks a lot

@aoke79
Copy link
Author

aoke79 commented Sep 4, 2024

pip-list.txt
attach the pip list FYI.
thanks

@aoke79
Copy link
Author

aoke79 commented Sep 9, 2024

Can anyone please take a look at this issue?
thanks

@Wovchena
Copy link
Collaborator

Wovchena commented Sep 9, 2024

--task is incorrect for optimum-cli. Try text-generation-with-past or don't specify it at all.

@aoke79
Copy link
Author

aoke79 commented Sep 10, 2024

if removed --task text-generation, will show below comments:

optimum-cli export openvino -m Meta--Llama-2-7b-chat-hf --weight-format int4 ov--Llama-2-7b-chat-hf-int4
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\ProgramData\anaconda3\envs\env_ov_optimum\Scripts\optimum-cli.exe_main
.py", line 7, in
File "C:\ProgramData\anaconda3\envs\env_ov_optimum\Lib\site-packages\optimum\commands\optimum_cli.py", line 208, in main
service.run()
File "C:\ProgramData\anaconda3\envs\env_ov_optimum\Lib\site-packages\optimum\commands\export\openvino.py", line 304, in run
task = infer_task(self.args.task, self.args.model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\env_ov_optimum\Lib\site-packages\optimum\exporters\openvino_main
.py", line 54, in infer_task
task = TasksManager.infer_task_from_model(model_name_or_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\env_ov_optimum\Lib\site-packages\optimum\exporters\tasks.py", line 1680, in infer_task_from_model
task = cls._infer_task_from_model_name_or_path(model, subfolder=subfolder, revision=revision)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\env_ov_optimum\Lib\site-packages\optimum\exporters\tasks.py", line 1593, in _infer_task_from_model_name_or_path
raise RuntimeError(
RuntimeError: Cannot infer the task from a local directory yet, please specify the task manually (image-to-text, image-to-image, image-classification, audio-classification, mask-generation, feature-extraction, zero-shot-image-classification, object-detection, image-segmentation, text-to-audio, semantic-segmentation, masked-im, sentence-similarity, audio-xvector, conversational, audio-frame-classification, stable-diffusion, automatic-speech-recognition, text2text-generation, fill-mask, question-answering, multiple-choice, text-classification, text-generation, zero-shot-object-detection, token-classification, stable-diffusion-xl, depth-estimation).

@aoke79
Copy link
Author

aoke79 commented Sep 10, 2024

it worked for --task text-generation-with-past, like below:

INFO:nncf:Statistics of the bitwidth distribution:
+----------------+-----------------------------+----------------------------------------+
| Num bits (N) | % all parameters (layers) | % ratio-defining parameters (layers) |
+================+=============================+========================================+
| 8 | 4% (2 / 226) | 0% (0 / 224) |
+----------------+-----------------------------+----------------------------------------+
| 4 | 96% (224 / 226) | 100% (224 / 224) |
+----------------+-----------------------------+----------------------------------------+
Applying Weight Compression ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 226/226 • 0:03:17 • 0:00:00
Set tokenizer padding side to left for text-generation-with-past task.

BTW: how can I know which parameter used for which models?
Thanks a lot

@aoke79
Copy link
Author

aoke79 commented Sep 10, 2024

I used the new generated the model, "benchmark_genai" still do not work on that.

python benchmark_genai.py -m C:\AIGC\hf\llama2_7b_chat_ov_int4_default_24_3 -p "why the Sun is yellow?" -nw 1 -n 1 -mt 200 -d NPU
Traceback (most recent call last):
File "C:\AIGC\openvino\openvino.genai\samples\python\benchmark_genai\benchmark_genai.py", line 49, in
main()
File "C:\AIGC\openvino\openvino.genai\samples\python\benchmark_genai\benchmark_genai.py", line 32, in main
pipe.generate(prompt, config)
RuntimeError: Exception from C:\Jenkins\workspace\private-ci\ie\build-windows-vs2019\b\repos\openvino.genai\src\cpp\src\llm_pipeline_static.cpp:206:
Currently only batch size=1 is supported

Thanks,

@TolyaTalamanov
Copy link
Collaborator

Hi @aoke79 the problem should be fixed already, please update packages:

pip uninstall openvino openvino-tokenizers openvino-genai
pip install --pre openvino openvino-tokenizers openvino-genai --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly

@Panepo
Copy link

Panepo commented Dec 6, 2024

Hi @TolyaTalamanov,

I have update packages but got a similar problem, the program terminates without any message. Here's my code:

import openvino_genai as ov_genai

model_dir = "models/TinyLlama-1.1B-Chat-v1.0-int4-ov"
pipe = ov_genai.LLMPipeline(str(model_dir), "NPU")

config = ov_genai.GenerationConfig()
config.max_new_tokens = 2048

message = "Good morning"
response = pipe.generate([message], config)

perf_metrics = response.perf_metrics
print(f"Load time: {perf_metrics.get_load_time():.2f} ms")
print(f"Generate time: {perf_metrics.get_generate_duration().mean:.2f} ± {perf_metrics.get_generate_duration().std:.2f} ms")
print(f"Tokenization time: {perf_metrics.get_tokenization_duration().mean:.2f} ± {perf_metrics.get_tokenization_duration().std:.2f} ms")
print(f"Detokenization time: {perf_metrics.get_detokenization_duration().mean:.2f} ± {perf_metrics.get_detokenization_duration().std:.2f} ms")
print(f"TTFT: {perf_metrics.get_ttft().mean:.2f} ± {perf_metrics.get_ttft().std:.2f} ms")
print(f"TPOT: {perf_metrics.get_tpot().mean:.2f} ± {perf_metrics.get_tpot().std:.2f} ms")
print(f"Throughput : {perf_metrics.get_throughput().mean:.2f} ± {perf_metrics.get_throughput().std:.2f} tokens/s")

The model is download from OpenVINO HuggingFace

My CPU is Ultra7 165U, NPU driver version is 32.0.100.3104 and the platform is Win11 23H2

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants