-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed to run Llama-2-7b-chat-hf on NPU through Sample/Python #820
Comments
pip-list.txt |
Can anyone please take a look at this issue? |
--task is incorrect for optimum-cli. Try text-generation-with-past or don't specify it at all. |
if removed --task text-generation, will show below comments: optimum-cli export openvino -m Meta--Llama-2-7b-chat-hf --weight-format int4 ov--Llama-2-7b-chat-hf-int4 |
it worked for --task text-generation-with-past, like below: INFO:nncf:Statistics of the bitwidth distribution: BTW: how can I know which parameter used for which models? |
I used the new generated the model, "benchmark_genai" still do not work on that. python benchmark_genai.py -m C:\AIGC\hf\llama2_7b_chat_ov_int4_default_24_3 -p "why the Sun is yellow?" -nw 1 -n 1 -mt 200 -d NPU Thanks, |
Hi @aoke79 the problem should be fixed already, please update packages:
|
Hi @TolyaTalamanov, I have update packages but got a similar problem, the program terminates without any message. Here's my code:
The model is download from OpenVINO HuggingFace My CPU is Ultra7 165U, NPU driver version is 32.0.100.3104 and the platform is Win11 23H2 Thanks |
Dears,
I failed to run Llama-2-7b-chat-hf on NPU, please give me a hand.
a) optimum-cli export openvino --task text-generation -m Meta--Llama-2-7b-chat-hf --weight-format int4_sym_g128 --ratio 1.0 ov--Llama-2-7b-chat-hf-int4-sym-g128
b) optimum-cli export openvino --task text-generation -m Meta--Llama-2-7b-chat-hf --weight-format int4 ov--Llama-2-7b-chat-hf-int4
a) python beam_search_causal_lm.py c:\AIGC\hf\ov--Llama-2-7b-chat-hf-int4-sym-g128 "why the Sun is yellow?"
b) python chat_sample.py c:\AIGC\hf\ov--Llama-2-7b-chat-hf-int4-sym-g128
c) python benchmark_genai.py -m C:\AIGC\openvino\models\ov--Llama-2-7b-chat-hf-int4-sym-g128 -p "why the Sun is yellow?" -nw 1 -n 1 -mt 200 -d CPU
(env_ov_genai) c:\AIGC\openvino\openvino.genai\samples\python\beam_search_causal_lm>python beam_search_causal_lm.py c:\AIGC\hf\ov--Llama-2-7b-chat-hf-int4-sym-g128 "why the Sun is yellow?"
Traceback (most recent call last):
File "c:\AIGC\openvino\openvino.genai\samples\python\beam_search_causal_lm\beam_search_causal_lm.py", line 29, in
main()
File "c:\AIGC\openvino\openvino.genai\samples\python\beam_search_causal_lm\beam_search_causal_lm.py", line 24, in main
beams = pipe.generate(args.prompts, config)
RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:79:
Check '::getPort(port, name, {_impl->get_inputs(), _impl->get_outputs()})' failed at src\inference\src\cpp\infer_request.cpp:79:
Port for tensor name beam_idx was not found.
(env_ov_genai) c:\AIGC\openvino\openvino.genai\samples\python\benchmark_genai>python benchmark_genai.py -m c:\AIGC\openvino\models\TinyLlama-1.1B-Chat-v1.0\OV_FP16-4BIT_DEFAULT -p "why the Sun is yellow?" -nw 1 -n 1 -mt 200 -d NPU
Traceback (most recent call last):
File "c:\AIGC\openvino\openvino.genai\samples\python\benchmark_genai\benchmark_genai.py", line 49, in
main()
File "c:\AIGC\openvino\openvino.genai\samples\python\benchmark_genai\benchmark_genai.py", line 32, in main
pipe.generate(prompt, config)
RuntimeError: Exception from C:\Jenkins\workspace\private-ci\ie\build-windows-vs2019\b\repos\openvino.genai\src\cpp\src\llm_pipeline_static.cpp:206:
Currently only batch size=1 is supported
(env_ov_genai) c:\AIGC\openvino\openvino.genai\samples\python>python chat_sample.py c:\AIGC\hf\ov--Llama-2-7b-chat-hf-int4-sym-g128
Traceback (most recent call last):
File "c:\AIGC\openvino\openvino.genai\samples\python\chat_sample.py", line 43, in
main()
File "c:\AIGC\openvino\openvino.genai\samples\python\chat_sample.py", line 22, in main
pipe = openvino_genai.LLMPipeline(args.model_dir, device)
RuntimeError: Exception from src\core\src\pass\stateful_to_stateless.cpp:128:
Stateful models without
beam_idx
input are not supported in StatefulToStateless transformationI'm not sure if I converted the correct model, so I generated two models like above command line, but neither of them worked.
might you please show me how to do that?
Thanks a lot
The text was updated successfully, but these errors were encountered: