Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Llama3 #620

Merged
merged 2 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion samples/cpp/beam_search_causal_lm/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Text generation C++ sample that supports most popular models like LLaMA 2
# Text generation C++ sample that supports most popular models like LLaMA 3

This example showcases inference of text-generation Large Language Models (LLMs): `chatglm`, `LLaMA`, `Qwen` and other models with the same signature. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. It's only possible to change the device for inference to a differnt one, GPU for example, from the command line interface. The sample fearures `ov::genai::LLMPipeline` and configures it to use multiple beam grops. There is also a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) which provides an example of LLM-powered Chatbot in Python.

Expand Down
2 changes: 1 addition & 1 deletion samples/cpp/chat_sample/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# C++ chat_sample that supports most popular models like LLaMA 2
# C++ chat_sample that supports most popular models like LLaMA 3

This example showcases inference of text-generation Large Language Models (LLMs): `chatglm`, `LLaMA`, `Qwen` and other models with the same signature. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. The sample fearures `ov::genai::LLMPipeline` and configures it for the chat scenario. There is also a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) which provides an example of LLM-powered Chatbot in Python.

Expand Down
2 changes: 1 addition & 1 deletion samples/cpp/greedy_causal_lm/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Text generation C++ greedy_causal_lm that supports most popular models like LLaMA 2
# Text generation C++ greedy_causal_lm that supports most popular models like LLaMA 3

This example showcases inference of text-generation Large Language Models (LLMs): `chatglm`, `LLaMA`, `Qwen` and other models with the same signature. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. The sample fearures `ov::genai::LLMPipeline` and configures it to run the simplest deterministic greedy sampling algorithm. There is also a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) which provides an example of LLM-powered Chatbot in Python.

Expand Down
2 changes: 1 addition & 1 deletion samples/cpp/multinomial_causal_lm/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Text generation C++ multinomial_causal_lm that supports most popular models like LLaMA 2
# Text generation C++ multinomial_causal_lm that supports most popular models like LLaMA 3

This example showcases inference of text-generation Large Language Models (LLMs): `chatglm`, `LLaMA`, `Qwen` and other models with the same signature. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. The sample fearures `ov::genai::LLMPipeline` and configures it to run random sampling algorithm. There is also a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) which provides an example of LLM-powered Chatbot in Python.

Expand Down
2 changes: 1 addition & 1 deletion samples/cpp/prompt_lookup_decoding_lm/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# prompt_lookup_decoding_lm C++ sample that supports most popular models like LLaMA 2
# prompt_lookup_decoding_lm C++ sample that supports most popular models like LLaMA 3

[Prompt Lookup decoding](https://github.com/apoorvumang/prompt-lookup-decoding) is [assested-generation](https://huggingface.co/blog/assisted-generation#understanding-text-generation-latency) technique where the draft model is replaced with simple string matching the prompt to generate candidate token sequences. This method highly effective for input grounded generation (summarization, document QA, multi-turn chat, code editing), where there is high n-gram overlap between LLM input (prompt) and LLM output. This could be entity names, phrases, or code chunks that the LLM directly copies from the input while generating the output. Prompt lookup exploits this pattern to speed up autoregressive decoding in LLMs. This results in significant speedups with no effect on output quality.

Expand Down
2 changes: 1 addition & 1 deletion samples/cpp/speculative_decoding_lm/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# speculative_decoding_lm C++ sample that supports most popular models like LLaMA 2
# speculative_decoding_lm C++ sample that supports most popular models like LLaMA 3

Speculative decoding (or [assisted-generation](https://huggingface.co/blog/assisted-generation#understanding-text-generation-latency) in HF terminology) is a recent technique, that allows to speed up token generation when an additional smaller draft model is used alonside with the main model.

Expand Down
2 changes: 1 addition & 1 deletion samples/python/beam_search_causal_lm/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Text generation Python sample that supports most popular models like LLaMA 2
# Text generation Python sample that supports most popular models like LLaMA 3

This example showcases inference of text-generation Large Language Models (LLMs): `chatglm`, `LLaMA`, `Qwen` and other models with the same signature. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. It's only possible to change the device for inference to a differnt one, GPU for example, from the command line interface. The sample fearures `openvino_genai.LLMPipeline` and configures it to use multiple beam grops. There is also a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) which provides an example of LLM-powered Chatbot in Python.

Expand Down
2 changes: 1 addition & 1 deletion samples/python/chat_sample/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Python chat_sample that supports most popular models like LLaMA 2
# Python chat_sample that supports most popular models like LLaMA 3

This example showcases inference of text-generation Large Language Models (LLMs): `chatglm`, `LLaMA`, `Qwen` and other models with the same signature. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. The sample fearures `openvino_genai.LLMPipeline` and configures it for the chat scenario. There is also a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) which provides an example of LLM-powered Chatbot in Python.

Expand Down
2 changes: 1 addition & 1 deletion samples/python/greedy_causal_lm/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Text generation Python greedy_causal_lm that supports most popular models like LLaMA 2
# Text generation Python greedy_causal_lm that supports most popular models like LLaMA 3

This example showcases inference of text-generation Large Language Models (LLMs): `chatglm`, `LLaMA`, `Qwen` and other models with the same signature. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. The sample fearures `openvino_genai.LLMPipeline` and configures it to run the simplest deterministic greedy sampling algorithm. There is also a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) which provides an example of LLM-powered Chatbot in Python.

Expand Down
2 changes: 1 addition & 1 deletion samples/python/multinomial_causal_lm/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Text generation Python multinomial_causal_lm that supports most popular models like LLaMA 2
# Text generation Python multinomial_causal_lm that supports most popular models like LLaMA 3

This example showcases inference of text-generation Large Language Models (LLMs): `chatglm`, `LLaMA`, `Qwen` and other models with the same signature. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. The sample fearures `ov::genai::LLMPipeline` and configures it to run random sampling algorithm. There is also a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) which provides an example of LLM-powered Chatbot in Python.

Expand Down
12 changes: 12 additions & 0 deletions src/docs/SUPPORTED_MODELS.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,18 @@
</tr>
<tr>
<td rowspan="3" vertical-align="top"><code>LlamaForCausalLM</code></td>
Wovchena marked this conversation as resolved.
Show resolved Hide resolved
<td>Llama 3</td>
<td>
<ul>
<li><a href="https://huggingface.co/meta-llama/Meta-Llama-3-8B"><code>meta-llama/Meta-Llama-3-8B</code></a></li>
<li><a href="https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct"><code>meta-llama/Meta-Llama-3-8B-Instruct</code></a></li>
<li><a href="https://huggingface.co/meta-llama/Meta-Llama-3-70B"><code>meta-llama/Meta-Llama-3-70B</code></a></li>
<li><a href="https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct"><code>meta-llama/Meta-Llama-3-70B-Instruct</code></a></li>
</ul>
</td>
</tr>
<tr>
<!-- <td><code>LlamaForCausalLM</code></td> -->
<td>Llama 2</td>
<td>
<ul>
Expand Down
Loading