Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
ivy-lv11 committed May 23, 2024
1 parent a1e34d7 commit 41bce6c
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions docs/docs/examples/llm/ipex_llm_gpu.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,9 @@
"\n",
"## `IpexLLM`\n",
"\n",
"Setting `device_map=\"xpu\"` when initializing `IpexLLM` will put the embedding model on Intel GPU and benefit from IPEX-LLM optimizations. Use proper prompt format for zephyr-7b-alpha following the [model card](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)\n",
"Setting `device_map=\"xpu\"` when initializing `IpexLLM` will put the LLM model on Intel GPU and benefit from IPEX-LLM optimizations.\n",
"\n",
"Before loading the Zephyr model, you'll need to define `completion_to_prompt` and `messages_to_prompt` for formatting prompts. Follow proper prompt format for zephyr-7b-alpha following the [model card](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha). This is essential for preparing inputs that the model can interpret accurately. Load the Zephyr model locally using IpexLLM using `IpexLLM.from_model_id`. It will load the model directly in its Huggingface format and convert it automatically to low-bit format for inference.\n",
"\n",
"```python\n",
"# Transform a string into input zephyr-specific input\n",
Expand Down Expand Up @@ -142,7 +144,7 @@
"\n",
"Alternatively, you might save the low-bit model to disk once and use `from_model_id_low_bit` instead of `from_model_id` to reload it for later use - even across different machines. It is space-efficient, as the low-bit model demands significantly less disk space than the original model. And `from_model_id_low_bit` is also more efficient than `from_model_id` in terms of speed and memory usage, as it skips the model conversion step. \n",
"\n",
"To save the low-bit model, use `save_low_bit` as follows. Then load the model from saved lowbit model path as follows. Also use `device_map` to load the model to xpu. \n",
"To save the low-bit model, use `save_low_bit` as follows. Then load the model from saved lowbit model path. Also use `device_map` to load the model to xpu. \n",
"> Note that the saved path for the low-bit model only includes the model itself but not the tokenizers. If you wish to have everything in one place, you will need to manually download or copy the tokenizer files from the original model's directory to the location where the low-bit model is saved.\n",
"\n",
"Try stream completion using the loaded low-bit model. \n",
Expand Down

0 comments on commit 41bce6c

Please sign in to comment.