Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge releases/2024/3 into master #720

Merged
Merged
Show file tree
Hide file tree
Changes from 59 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
e4637b3
Workaround (#618)
Wovchena Jul 15, 2024
423c8e3
Revert to python3
Wovchena Jul 15, 2024
8ad336c
Revert to python3 (#622)
akladiev Jul 15, 2024
1b1b2f0
Fix cmake Python var name (#624)
Wovchena Jul 15, 2024
70b74ad
Add ContinuousBatchingPipeline constructor similar to LLMPipeline (#604)
Wovchena Jul 15, 2024
f0c2677
Clear beam search info when generate() is finished. (#630)
popovaan Jul 15, 2024
73badf6
Update nncf_utils.py (#616) (#633)
KodiaqQ Jul 16, 2024
25655e3
Workaround cmake packaging (#634)
Wovchena Jul 16, 2024
754f6d7
Save licensing_genai into docs to align with OpenVINO (#637)
Wovchena Jul 16, 2024
e5247e0
Update submodule (#638)
Wovchena Jul 16, 2024
2d1fa3b
Add Llama3 (#620)
Wovchena Jul 17, 2024
489a87d
nightly->rc1 (#621)
Wovchena Jul 17, 2024
67f0467
Add OpenVINOGenAITargets to core_genai_dev COMPONENT (#642)
Wovchena Jul 17, 2024
1969160
Apply todo, initialize detokenizer's cache (#647)
Wovchena Jul 22, 2024
0e0f6a9
Cherry-pick static LLM pipeline changes (#654)
TolyaTalamanov Jul 22, 2024
cb100cb
[Continuous batching] Replace standard max_element call with custom l…
mzegla Jul 11, 2024
f0e4190
wip
pavel-esir Jul 12, 2024
7cab496
add detokenization metric; refactor split to perf_conter & perf_metrics
pavel-esir Jul 19, 2024
bb1113c
refactor structure, add python sample
pavel-esir Jul 22, 2024
7bf42f1
Cherry-pick custom max_element loop (#662)
mzegla Jul 22, 2024
0a8f0d9
add more preicise durations
pavel-esir Jul 22, 2024
bad01b9
Add note for pybind ov::Tensor issue (#659)
as-suvorov Jul 22, 2024
cb0da0a
[OV 24.3]Fix multinomial sample CMakeList (#658)
sammysun0711 Jul 22, 2024
bc92248
add Readme for tests (#664)
pavel-esir Jul 23, 2024
90320f4
add cpp Readme, ensured correct batch processing, add PerfMetrics to …
pavel-esir Jul 23, 2024
aeec730
use MeanStdPair
pavel-esir Jul 23, 2024
56eeafc
[2024.3] Fix symbol encode error (#629)
yatarkan Jul 24, 2024
8934a0e
[release branch] Add infer request queue for tokenizers and allow for…
dkalinowski Jul 24, 2024
12f8e44
Add max_new_tokens to every generate call in src/README.md (#670)
pavel-esir Jul 24, 2024
f9e45e1
Add CB naive chat (#644)
Wovchena Jul 24, 2024
03590c5
return back py::object -> AnyMap (#679)
pavel-esir Jul 24, 2024
53945f7
Update openvino_tokenizers (#680)
Wovchena Jul 24, 2024
a769b33
Allow dev and rc tokenizers (#681)
Wovchena Jul 24, 2024
e449ffe
Fix chat templates with slices, add tokenizer config for `mistralai/M…
yatarkan Jul 25, 2024
406393f
Prefix caching. (#675)
popovaan Jul 26, 2024
c45aed5
Merge remote-tracking branch 'upstream/releases/2024/3' into add_perf…
pavel-esir Jul 26, 2024
be2fdaf
resolve conflicts
pavel-esir Jul 26, 2024
b00bcd8
apply comments
pavel-esir Jul 26, 2024
60e7188
uset getter and cache evaluate results
pavel-esir Jul 26, 2024
e553ef5
update Readme's
pavel-esir Jul 26, 2024
3bfbab5
StaticLLMPipeline dangling models hotfix (#693)
TolyaTalamanov Jul 26, 2024
102f00a
add generation time metrics (#613)
andrei-kochin Jul 26, 2024
06c57b7
Remove Dockerfile (#700)
mzegla Jul 29, 2024
e286469
StaticLLMPipeline - align u4 zero points (#705)
TolyaTalamanov Jul 30, 2024
2a80828
Disable broken test (#707)
Wovchena Jul 31, 2024
d89cdcb
update optimum commit for releases/2024/3 (#711)
eaidova Jul 31, 2024
2428a3a
change commit for optimum
eaidova Jul 31, 2024
1473e7f
Merge branch 'releases/2024/3' into ea/upd_opt_commit
eaidova Jul 31, 2024
8cb12b2
change commit for optimum (#714)
andrei-kochin Jul 31, 2024
2f778f3
Add perf metric docstrings (#713)
pavel-esir Jul 31, 2024
2dc6b64
rc1->rc2 (#695)
Wovchena Jul 31, 2024
3bfdd3f
Docs for version compatibility (#692)
yatarkan Jul 31, 2024
a9d6541
Merge branch 'releases/2024/3' into merge-releases/2024/3-into-master
Wovchena Jul 31, 2024
e76f9f9
coorect after git merge conflict resolution
Wovchena Jul 31, 2024
9a0b7e9
coorect after git merge conflict resolution
Wovchena Jul 31, 2024
a88cfc8
coorect after git merge conflict resolution
Wovchena Jul 31, 2024
2311f6e
coorect after git merge conflict resolution
Wovchena Jul 31, 2024
45937bc
cache_size
Wovchena Aug 1, 2024
4962039
skip
Wovchena Aug 1, 2024
b5f21dc
Correct links and typos
Wovchena Aug 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 0 additions & 38 deletions Dockerfile

This file was deleted.

1 change: 1 addition & 0 deletions samples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ add_subdirectory(cpp/greedy_causal_lm)
add_subdirectory(cpp/multinomial_causal_lm)
add_subdirectory(cpp/prompt_lookup_decoding_lm)
add_subdirectory(cpp/speculative_decoding_lm)
add_subdirectory(cpp/benchmark_genai)

install(FILES requirements.txt DESTINATION samples
COMPONENT cpp_samples_genai)
Expand Down
24 changes: 24 additions & 0 deletions samples/cpp/benchmark_genai/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Copyright (C) 2023-2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0


find_package(OpenVINOGenAI REQUIRED PATHS
"${CMAKE_BINARY_DIR}" # Reuse the package from the build.
${OpenVINO_DIR} # GenAI may be installed alogside OpenVINO.
)

FetchContent_Declare(cxxopts
URL https://github.com/jarro2783/cxxopts/archive/refs/tags/v3.1.1.tar.gz
URL_HASH SHA256=523175f792eb0ff04f9e653c90746c12655f10cb70f1d5e6d6d9491420298a08)
FetchContent_MakeAvailable(cxxopts)

add_executable(benchmark_genai benchmark_genai.cpp)
target_link_libraries(benchmark_genai PRIVATE openvino::genai cxxopts::cxxopts)
set_target_properties(benchmark_genai PROPERTIES
COMPILE_PDB_NAME benchmark_genai
# Ensure out of box LC_RPATH on macOS with SIP
INSTALL_RPATH_USE_LINK_PATH ON)
install(TARGETS benchmark_genai
RUNTIME DESTINATION samples_bin/
COMPONENT samples_bin
EXCLUDE_FROM_ALL)
47 changes: 47 additions & 0 deletions samples/cpp/benchmark_genai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# LLMs benchmarking sample

This sample script demonstrates how to benchmark an LLMs in OpenVINO GenAI. The script includes functionality for warm-up iterations, generating text, and calculating various performance metrics.

## Download and convert the model and tokenizers

The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version.

It's not required to install [../../requirements.txt](../../requirements.txt) for deployment if the model has already been exported.

```sh
pip install --upgrade-strategy eager -r ../../requirements.txt
optimum-cli export openvino --trust-remote-code --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 TinyLlama-1.1B-Chat-v1.0
```

## Usage

```sh
benchmark_vanilla_genai [OPTIONS]
```

### Options

- `-m, --model`: Path to the model and tokenizers base directory.
- `-p, --prompt` (default: `"The Sky is blue because"`): The prompt to generate text.
- `-nw, --num_warmup` (default: `1`): Number of warmup iterations.
- `-mt, --max_new_tokens` (default: `20`): Number of warmup iterations.
- `-n, --num_iter` (default: `3`): Number of iterations.
- `-d, --device` (default: `"CPU"`): Device to run the model on.

### Output:

```
benchmark_vanilla_genai -m TinyLlama-1.1B-Chat-v1.0 -n 10
```

```
Load time: 3405.69 ms
Generate time: 1430.77 ± 3.04 ms
Tokenization time: 0.51 ± 0.02 ms
Detokenization time: 0.37 ± 0.01 ms
TTFT: 81.60 ± 0.54 ms
TPOT: 71.52 ± 2.72 ms
Throughput tokens/s: 13.98 ± 0.53
```

For more information how performance metrics are calculated please follow [performance-metrics tutorial](../../../src/README.md#performance-metrics).
70 changes: 70 additions & 0 deletions samples/cpp/benchmark_genai/benchmark_genai.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have CI runs for these new samples? I don't see it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It was last hour merge

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a task for it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created 148650

Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
// Copyright (C) 2023-2024 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

#include "openvino/genai/llm_pipeline.hpp"
#include <cxxopts.hpp>

int main(int argc, char* argv[]) try {
cxxopts::Options options("benchmark_vanilla_genai", "Help command");

options.add_options()
("m,model", "Path to model and tokenizers base directory", cxxopts::value<std::string>()->default_value("."))
("p,prompt", "Prompt", cxxopts::value<std::string>()->default_value("The Sky is blue because"))
("nw,num_warmup", "Number of warmup iterations", cxxopts::value<size_t>()->default_value(std::to_string(1)))
("n,num_iter", "Number of iterations", cxxopts::value<size_t>()->default_value(std::to_string(3)))
("mt,max_new_tokens", "Maximal number of new tokens", cxxopts::value<size_t>()->default_value(std::to_string(20)))
("d,device", "device", cxxopts::value<std::string>()->default_value("CPU"))
("h,help", "Print usage");

cxxopts::ParseResult result;
try {
result = options.parse(argc, argv);
} catch (const cxxopts::exceptions::exception& e) {
std::cout << e.what() << "\n\n";
std::cout << options.help() << std::endl;
return EXIT_FAILURE;
}

if (result.count("help")) {
std::cout << options.help() << std::endl;
return EXIT_SUCCESS;
}

std::string prompt = result["prompt"].as<std::string>();
const std::string model_path = result["model"].as<std::string>();
std::string device = result["device"].as<std::string>();
size_t num_warmup = result["num_warmup"].as<size_t>();
size_t num_iter = result["num_iter"].as<size_t>();

ov::genai::GenerationConfig config;
config.max_new_tokens = result["max_new_tokens"].as<size_t>();

ov::genai::LLMPipeline pipe(model_path, device);

for (size_t i = 0; i < num_warmup; i++)
pipe.generate(prompt, config);

ov::genai::DecodedResults res = pipe.generate(prompt, config);
ov::genai::PerfMetrics metrics = res.perf_metrics;
for (size_t i = 0; i < num_iter - 1; i++) {
res = pipe.generate(prompt, config);
metrics = metrics + res.perf_metrics;
}

std::cout << std::fixed << std::setprecision(2);
std::cout << "Load time: " << metrics.get_load_time() << " ms" << std::endl;
std::cout << "Generate time: " << metrics.get_generate_duration().mean << " ± " << metrics.get_generate_duration().std << " ms" << std::endl;
std::cout << "Tokenization time: " << metrics.get_tokenization_duration().mean << " ± " << metrics.get_tokenization_duration().std << " ms" << std::endl;
std::cout << "Detokenization time: " << metrics.get_detokenization_duration().mean << " ± " << metrics.get_detokenization_duration().std << " ms" << std::endl;
std::cout << "TTFT: " << metrics.get_ttft().mean << " ± " << metrics.get_ttft().std << " ms" << std::endl;
std::cout << "TPOT: " << metrics.get_tpot().mean << " ± " << metrics.get_tpot().std << " ms/token " << std::endl;
std::cout << "Throughput: " << metrics.get_throughput().mean << " ± " << metrics.get_throughput().std << " tokens/s" << std::endl;

return 0;
} catch (const std::exception& error) {
std::cerr << error.what() << '\n';
return EXIT_FAILURE;
} catch (...) {
std::cerr << "Non-exception object thrown\n";
return EXIT_FAILURE;
}
14 changes: 13 additions & 1 deletion samples/python/beam_search_causal_lm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,20 @@ optimum-cli export openvino --trust-remote-code --model TinyLlama/TinyLlama-1.1B

`beam_search_causal_lm.py TinyLlama-1.1B-Chat-v1.0 "Why is the Sun yellow?"`

To enable Unicode characters for Windows cmd open `Region` settings from `Control panel`. `Administrative`->`Change system locale`->`Beta: Use Unicode UTF-8 for worldwide language support`->`OK`. Reboot.

Discrete GPUs (dGPUs) usually provide better performance compared to CPUs. It is recommended to run larger models on a dGPU with 32GB+ RAM. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a dGPU. Modify the source code to change the device for inference to the GPU.

See https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md#supported-models for the list of supported models.

### Troubleshooting

#### Unicode characters encoding error on Windows

Example error:
```
UnicodeEncodeError: 'charmap' codec can't encode character '\u25aa' in position 0: character maps to <undefined>
```

If you encounter the error described in the example when sample is printing output to the Windows console, it is likely due to the default Windows encoding not supporting certain Unicode characters. To resolve this:
1. Enable Unicode characters for Windows cmd - open `Region` settings from `Control panel`. `Administrative`->`Change system locale`->`Beta: Use Unicode UTF-8 for worldwide language support`->`OK`. Reboot.
2. Enable UTF-8 mode by setting environment variable `PYTHONIOENCODING="utf8"`.
47 changes: 47 additions & 0 deletions samples/python/benchmark_genai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# LLMs benchmarking sample

This sample script demonstrates how to benchmark an LLMs in OpenVINO GenAI. The script includes functionality for warm-up iterations, generating text, and calculating various performance metrics.

## Download and convert the model and tokenizers

The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version.

It's not required to install [../../requirements.txt](../../requirements.txt) for deployment if the model has already been exported.

```sh
pip install --upgrade-strategy eager -r ../../requirements.txt
optimum-cli export openvino --trust-remote-code --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 TinyLlama-1.1B-Chat-v1.0
```

## Usage

```sh
python benchmark_vanilla_genai.py [OPTIONS]
```

### Options

- `-m, --model`: Path to the model and tokenizers base directory.
- `-p, --prompt` (default: `"The Sky is blue because"`): The prompt to generate text.
- `-nw, --num_warmup` (default: `1`): Number of warmup iterations.
- `-n, --num_iter` (default: `3`): Number of iterations.
- `-mt, --max_new_tokens` (default: `20`): Number of warmup iterations.
- `-d, --device` (default: `"CPU"`): Device to run the model on.

### Output:

```
python benchmark_vanilla_genai.py -m TinyLlama-1.1B-Chat-v1.0 -n 10
ilya-lavrenov marked this conversation as resolved.
Show resolved Hide resolved
```

```
Load time: 3405.69 ms
Generate time: 1430.77 ± 3.04 ms
Tokenization time: 0.51 ± 0.02 ms
Detokenization time: 0.37 ± 0.01 ms
TTFT: 81.60 ± 0.54 ms
TPOT: 71.52 ± 2.72 ms
Throughput tokens/s: 13.98 ± 0.53
```

For more information on how performance metrics are calculated, see [performance metrics readme](../../../src/README.md#performance-metrics).
49 changes: 49 additions & 0 deletions samples/python/benchmark_genai/benchmark_genai.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Copyright (C) 2023-2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

import argparse
import openvino_genai as ov_genai

def main():
parser = argparse.ArgumentParser(description="Help command")
parser.add_argument("-m", "--model", type=str, help="Path to model and tokenizers base directory")
parser.add_argument("-p", "--prompt", type=str, default="The Sky is blue because", help="Prompt")
parser.add_argument("-nw", "--num_warmup", type=int, default=1, help="Number of warmup iterations")
parser.add_argument("-n", "--num_iter", type=int, default=2, help="Number of iterations")
parser.add_argument("-mt", "--max_new_tokens", type=int, default=20, help="Maximal number of new tokens")
parser.add_argument("-d", "--device", type=str, default="CPU", help="Device")

args = parser.parse_args()

# Perf metrics is stored in DecodedResults.
# In order to get DecodedResults instead of a string input should be a list.
prompt = [args.prompt]
model_path = args.model
device = args.device
num_warmup = args.num_warmup
num_iter = args.num_iter

config = ov_genai.GenerationConfig()
config.max_new_tokens = args.max_new_tokens

pipe = ov_genai.LLMPipeline(model_path, device)

for _ in range(num_warmup):
pipe.generate(prompt, config)

res = pipe.generate(prompt, config)
perf_metrics = res.perf_metrics
for _ in range(num_iter - 1):
res = pipe.generate(prompt, config)
perf_metrics += res.perf_metrics

print(f"Load time: {perf_metrics.get_load_time():.2f} ms")
print(f"Generate time: {perf_metrics.get_generate_duration().mean:.2f} ± {perf_metrics.get_generate_duration().std:.2f} ms")
print(f"Tokenization time: {perf_metrics.get_tokenization_duration().mean:.2f} ± {perf_metrics.get_tokenization_duration().std:.2f} ms")
print(f"Detokenization time: {perf_metrics.get_detokenization_duration().mean:.2f} ± {perf_metrics.get_detokenization_duration().std:.2f} ms")
print(f"TTFT: {perf_metrics.get_ttft().mean:.2f} ± {perf_metrics.get_ttft().std:.2f} ms")
print(f"TPOT: {perf_metrics.get_tpot().mean:.2f} ± {perf_metrics.get_tpot().std:.2f} ms")
print(f"Throughput : {perf_metrics.get_throughput().mean:.2f} ± {perf_metrics.get_throughput().std:.2f} tokens/s")

if __name__ == "__main__":
main()
16 changes: 13 additions & 3 deletions samples/python/chat_sample/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,25 @@ optimum-cli export openvino --trust-remote-code --model TinyLlama/TinyLlama-1.1B

`chat_sample.py TinyLlama-1.1B-Chat-v1.0`

To enable Unicode characters for Windows cmd open `Region` settings from `Control panel`. `Administrative`->`Change system locale`->`Beta: Use Unicode UTF-8 for worldwide language support`->`OK`. Reboot.

Discrete GPUs (dGPUs) usually provide better performance compared to CPUs. It is recommended to run larger models on a dGPU with 32GB+ RAM. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a dGPU. Modify the source code to change the device for inference to the GPU.

See https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md#supported-models for the list of supported models.

### Troubleshooting

## Troubleshooting
### Missing chat template
#### Unicode characters encoding error on Windows

Example error:
```
UnicodeEncodeError: 'charmap' codec can't encode character '\u25aa' in position 0: character maps to <undefined>
```

If you encounter the error described in the example when sample is printing output to the Windows console, it is likely due to the default Windows encoding not supporting certain Unicode characters. To resolve this:
1. Enable Unicode characters for Windows cmd - open `Region` settings from `Control panel`. `Administrative`->`Change system locale`->`Beta: Use Unicode UTF-8 for worldwide language support`->`OK`. Reboot.
2. Enable UTF-8 mode by setting environment variable `PYTHONIOENCODING="utf8"`.

#### Missing chat template

If you encounter an exception indicating a missing "chat template" when launching the `ov::genai::LLMPipeline` in chat mode, it likely means the model was not tuned for chat functionality. To work this around, manually add the chat template to tokenizer_config.json of your model.
The following template can be used as a default, but it may not work properly with every model:
Expand Down
14 changes: 13 additions & 1 deletion samples/python/greedy_causal_lm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,20 @@ optimum-cli export openvino --trust-remote-code --model TinyLlama/TinyLlama-1.1B

`greedy_causal_lm.py TinyLlama-1.1B-Chat-v1.0 "Why is the Sun yellow?"`

To enable Unicode characters for Windows cmd open `Region` settings from `Control panel`. `Administrative`->`Change system locale`->`Beta: Use Unicode UTF-8 for worldwide language support`->`OK`. Reboot.

Discrete GPUs (dGPUs) usually provide better performance compared to CPUs. It is recommended to run larger models on a dGPU with 32GB+ RAM. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a dGPU. Modify the source code to change the device for inference to the GPU.

See https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md#supported-models for the list of supported models.

### Troubleshooting

#### Unicode characters encoding error on Windows

Example error:
```
UnicodeEncodeError: 'charmap' codec can't encode character '\u25aa' in position 0: character maps to <undefined>
```

If you encounter the error described in the example when sample is printing output to the Windows console, it is likely due to the default Windows encoding not supporting certain Unicode characters. To resolve this:
1. Enable Unicode characters for Windows cmd - open `Region` settings from `Control panel`. `Administrative`->`Change system locale`->`Beta: Use Unicode UTF-8 for worldwide language support`->`OK`. Reboot.
2. Enable UTF-8 mode by setting environment variable `PYTHONIOENCODING="utf8"`.
14 changes: 13 additions & 1 deletion samples/python/multinomial_causal_lm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,20 @@ optimum-cli export openvino --trust-remote-code --model TinyLlama/TinyLlama-1.1B

`multinomial_causal_lm.py TinyLlama-1.1B-Chat-v1.0 "Why is the Sun yellow?"`

To enable Unicode characters for Windows cmd open `Region` settings from `Control panel`. `Administrative`->`Change system locale`->`Beta: Use Unicode UTF-8 for worldwide language support`->`OK`. Reboot.

Discrete GPUs (dGPUs) usually provide better performance compared to CPUs. It is recommended to run larger models on a dGPU with 32GB+ RAM. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a dGPU. Modify the source code to change the device for inference to the GPU.

See https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md#supported-models for the list of supported models.

### Troubleshooting

#### Unicode characters encoding error on Windows

Example error:
```
UnicodeEncodeError: 'charmap' codec can't encode character '\u25aa' in position 0: character maps to <undefined>
```

If you encounter the error described in the example when sample is printing output to the Windows console, it is likely due to the default Windows encoding not supporting certain Unicode characters. To resolve this:
1. Enable Unicode characters for Windows cmd - open `Region` settings from `Control panel`. `Administrative`->`Change system locale`->`Beta: Use Unicode UTF-8 for worldwide language support`->`OK`. Reboot.
2. Enable UTF-8 mode by setting environment variable `PYTHONIOENCODING="utf8"`.
Loading
Loading