-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge releases/2024/3 into master #720
Merged
Wovchena
merged 60 commits into
openvinotoolkit:master
from
Wovchena:merge-releases/2024/3-into-master
Aug 1, 2024
Merged
Changes from 59 commits
Commits
Show all changes
60 commits
Select commit
Hold shift + click to select a range
e4637b3
Workaround (#618)
Wovchena 423c8e3
Revert to python3
Wovchena 8ad336c
Revert to python3 (#622)
akladiev 1b1b2f0
Fix cmake Python var name (#624)
Wovchena 70b74ad
Add ContinuousBatchingPipeline constructor similar to LLMPipeline (#604)
Wovchena f0c2677
Clear beam search info when generate() is finished. (#630)
popovaan 73badf6
Update nncf_utils.py (#616) (#633)
KodiaqQ 25655e3
Workaround cmake packaging (#634)
Wovchena 754f6d7
Save licensing_genai into docs to align with OpenVINO (#637)
Wovchena e5247e0
Update submodule (#638)
Wovchena 2d1fa3b
Add Llama3 (#620)
Wovchena 489a87d
nightly->rc1 (#621)
Wovchena 67f0467
Add OpenVINOGenAITargets to core_genai_dev COMPONENT (#642)
Wovchena 1969160
Apply todo, initialize detokenizer's cache (#647)
Wovchena 0e0f6a9
Cherry-pick static LLM pipeline changes (#654)
TolyaTalamanov cb100cb
[Continuous batching] Replace standard max_element call with custom l…
mzegla f0e4190
wip
pavel-esir 7cab496
add detokenization metric; refactor split to perf_conter & perf_metrics
pavel-esir bb1113c
refactor structure, add python sample
pavel-esir 7bf42f1
Cherry-pick custom max_element loop (#662)
mzegla 0a8f0d9
add more preicise durations
pavel-esir bad01b9
Add note for pybind ov::Tensor issue (#659)
as-suvorov cb0da0a
[OV 24.3]Fix multinomial sample CMakeList (#658)
sammysun0711 bc92248
add Readme for tests (#664)
pavel-esir 90320f4
add cpp Readme, ensured correct batch processing, add PerfMetrics to …
pavel-esir aeec730
use MeanStdPair
pavel-esir 56eeafc
[2024.3] Fix symbol encode error (#629)
yatarkan 8934a0e
[release branch] Add infer request queue for tokenizers and allow for…
dkalinowski 12f8e44
Add max_new_tokens to every generate call in src/README.md (#670)
pavel-esir f9e45e1
Add CB naive chat (#644)
Wovchena 03590c5
return back py::object -> AnyMap (#679)
pavel-esir 53945f7
Update openvino_tokenizers (#680)
Wovchena a769b33
Allow dev and rc tokenizers (#681)
Wovchena e449ffe
Fix chat templates with slices, add tokenizer config for `mistralai/M…
yatarkan 406393f
Prefix caching. (#675)
popovaan c45aed5
Merge remote-tracking branch 'upstream/releases/2024/3' into add_perf…
pavel-esir be2fdaf
resolve conflicts
pavel-esir b00bcd8
apply comments
pavel-esir 60e7188
uset getter and cache evaluate results
pavel-esir e553ef5
update Readme's
pavel-esir 3bfbab5
StaticLLMPipeline dangling models hotfix (#693)
TolyaTalamanov 102f00a
add generation time metrics (#613)
andrei-kochin 06c57b7
Remove Dockerfile (#700)
mzegla e286469
StaticLLMPipeline - align u4 zero points (#705)
TolyaTalamanov 2a80828
Disable broken test (#707)
Wovchena d89cdcb
update optimum commit for releases/2024/3 (#711)
eaidova 2428a3a
change commit for optimum
eaidova 1473e7f
Merge branch 'releases/2024/3' into ea/upd_opt_commit
eaidova 8cb12b2
change commit for optimum (#714)
andrei-kochin 2f778f3
Add perf metric docstrings (#713)
pavel-esir 2dc6b64
rc1->rc2 (#695)
Wovchena 3bfdd3f
Docs for version compatibility (#692)
yatarkan a9d6541
Merge branch 'releases/2024/3' into merge-releases/2024/3-into-master
Wovchena e76f9f9
coorect after git merge conflict resolution
Wovchena 9a0b7e9
coorect after git merge conflict resolution
Wovchena a88cfc8
coorect after git merge conflict resolution
Wovchena 2311f6e
coorect after git merge conflict resolution
Wovchena 45937bc
cache_size
Wovchena 4962039
skip
Wovchena b5f21dc
Correct links and typos
Wovchena File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Copyright (C) 2023-2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
|
||
find_package(OpenVINOGenAI REQUIRED PATHS | ||
"${CMAKE_BINARY_DIR}" # Reuse the package from the build. | ||
${OpenVINO_DIR} # GenAI may be installed alogside OpenVINO. | ||
) | ||
|
||
FetchContent_Declare(cxxopts | ||
URL https://github.com/jarro2783/cxxopts/archive/refs/tags/v3.1.1.tar.gz | ||
URL_HASH SHA256=523175f792eb0ff04f9e653c90746c12655f10cb70f1d5e6d6d9491420298a08) | ||
FetchContent_MakeAvailable(cxxopts) | ||
|
||
add_executable(benchmark_genai benchmark_genai.cpp) | ||
target_link_libraries(benchmark_genai PRIVATE openvino::genai cxxopts::cxxopts) | ||
set_target_properties(benchmark_genai PROPERTIES | ||
COMPILE_PDB_NAME benchmark_genai | ||
# Ensure out of box LC_RPATH on macOS with SIP | ||
INSTALL_RPATH_USE_LINK_PATH ON) | ||
install(TARGETS benchmark_genai | ||
RUNTIME DESTINATION samples_bin/ | ||
COMPONENT samples_bin | ||
EXCLUDE_FROM_ALL) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# LLMs benchmarking sample | ||
|
||
This sample script demonstrates how to benchmark an LLMs in OpenVINO GenAI. The script includes functionality for warm-up iterations, generating text, and calculating various performance metrics. | ||
|
||
## Download and convert the model and tokenizers | ||
|
||
The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version. | ||
|
||
It's not required to install [../../requirements.txt](../../requirements.txt) for deployment if the model has already been exported. | ||
|
||
```sh | ||
pip install --upgrade-strategy eager -r ../../requirements.txt | ||
optimum-cli export openvino --trust-remote-code --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 TinyLlama-1.1B-Chat-v1.0 | ||
``` | ||
|
||
## Usage | ||
|
||
```sh | ||
benchmark_vanilla_genai [OPTIONS] | ||
``` | ||
|
||
### Options | ||
|
||
- `-m, --model`: Path to the model and tokenizers base directory. | ||
- `-p, --prompt` (default: `"The Sky is blue because"`): The prompt to generate text. | ||
- `-nw, --num_warmup` (default: `1`): Number of warmup iterations. | ||
- `-mt, --max_new_tokens` (default: `20`): Number of warmup iterations. | ||
- `-n, --num_iter` (default: `3`): Number of iterations. | ||
- `-d, --device` (default: `"CPU"`): Device to run the model on. | ||
|
||
### Output: | ||
|
||
``` | ||
benchmark_vanilla_genai -m TinyLlama-1.1B-Chat-v1.0 -n 10 | ||
``` | ||
|
||
``` | ||
Load time: 3405.69 ms | ||
Generate time: 1430.77 ± 3.04 ms | ||
Tokenization time: 0.51 ± 0.02 ms | ||
Detokenization time: 0.37 ± 0.01 ms | ||
TTFT: 81.60 ± 0.54 ms | ||
TPOT: 71.52 ± 2.72 ms | ||
Throughput tokens/s: 13.98 ± 0.53 | ||
``` | ||
|
||
For more information how performance metrics are calculated please follow [performance-metrics tutorial](../../../src/README.md#performance-metrics). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
// Copyright (C) 2023-2024 Intel Corporation | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
#include "openvino/genai/llm_pipeline.hpp" | ||
#include <cxxopts.hpp> | ||
|
||
int main(int argc, char* argv[]) try { | ||
cxxopts::Options options("benchmark_vanilla_genai", "Help command"); | ||
|
||
options.add_options() | ||
("m,model", "Path to model and tokenizers base directory", cxxopts::value<std::string>()->default_value(".")) | ||
("p,prompt", "Prompt", cxxopts::value<std::string>()->default_value("The Sky is blue because")) | ||
("nw,num_warmup", "Number of warmup iterations", cxxopts::value<size_t>()->default_value(std::to_string(1))) | ||
("n,num_iter", "Number of iterations", cxxopts::value<size_t>()->default_value(std::to_string(3))) | ||
("mt,max_new_tokens", "Maximal number of new tokens", cxxopts::value<size_t>()->default_value(std::to_string(20))) | ||
("d,device", "device", cxxopts::value<std::string>()->default_value("CPU")) | ||
("h,help", "Print usage"); | ||
|
||
cxxopts::ParseResult result; | ||
try { | ||
result = options.parse(argc, argv); | ||
} catch (const cxxopts::exceptions::exception& e) { | ||
std::cout << e.what() << "\n\n"; | ||
std::cout << options.help() << std::endl; | ||
return EXIT_FAILURE; | ||
} | ||
|
||
if (result.count("help")) { | ||
std::cout << options.help() << std::endl; | ||
return EXIT_SUCCESS; | ||
} | ||
|
||
std::string prompt = result["prompt"].as<std::string>(); | ||
const std::string model_path = result["model"].as<std::string>(); | ||
std::string device = result["device"].as<std::string>(); | ||
size_t num_warmup = result["num_warmup"].as<size_t>(); | ||
size_t num_iter = result["num_iter"].as<size_t>(); | ||
|
||
ov::genai::GenerationConfig config; | ||
config.max_new_tokens = result["max_new_tokens"].as<size_t>(); | ||
|
||
ov::genai::LLMPipeline pipe(model_path, device); | ||
|
||
for (size_t i = 0; i < num_warmup; i++) | ||
pipe.generate(prompt, config); | ||
|
||
ov::genai::DecodedResults res = pipe.generate(prompt, config); | ||
ov::genai::PerfMetrics metrics = res.perf_metrics; | ||
for (size_t i = 0; i < num_iter - 1; i++) { | ||
res = pipe.generate(prompt, config); | ||
metrics = metrics + res.perf_metrics; | ||
} | ||
|
||
std::cout << std::fixed << std::setprecision(2); | ||
std::cout << "Load time: " << metrics.get_load_time() << " ms" << std::endl; | ||
std::cout << "Generate time: " << metrics.get_generate_duration().mean << " ± " << metrics.get_generate_duration().std << " ms" << std::endl; | ||
std::cout << "Tokenization time: " << metrics.get_tokenization_duration().mean << " ± " << metrics.get_tokenization_duration().std << " ms" << std::endl; | ||
std::cout << "Detokenization time: " << metrics.get_detokenization_duration().mean << " ± " << metrics.get_detokenization_duration().std << " ms" << std::endl; | ||
std::cout << "TTFT: " << metrics.get_ttft().mean << " ± " << metrics.get_ttft().std << " ms" << std::endl; | ||
std::cout << "TPOT: " << metrics.get_tpot().mean << " ± " << metrics.get_tpot().std << " ms/token " << std::endl; | ||
std::cout << "Throughput: " << metrics.get_throughput().mean << " ± " << metrics.get_throughput().std << " tokens/s" << std::endl; | ||
|
||
return 0; | ||
} catch (const std::exception& error) { | ||
std::cerr << error.what() << '\n'; | ||
return EXIT_FAILURE; | ||
} catch (...) { | ||
std::cerr << "Non-exception object thrown\n"; | ||
return EXIT_FAILURE; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# LLMs benchmarking sample | ||
|
||
This sample script demonstrates how to benchmark an LLMs in OpenVINO GenAI. The script includes functionality for warm-up iterations, generating text, and calculating various performance metrics. | ||
|
||
## Download and convert the model and tokenizers | ||
|
||
The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version. | ||
|
||
It's not required to install [../../requirements.txt](../../requirements.txt) for deployment if the model has already been exported. | ||
|
||
```sh | ||
pip install --upgrade-strategy eager -r ../../requirements.txt | ||
optimum-cli export openvino --trust-remote-code --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 TinyLlama-1.1B-Chat-v1.0 | ||
``` | ||
|
||
## Usage | ||
|
||
```sh | ||
python benchmark_vanilla_genai.py [OPTIONS] | ||
``` | ||
|
||
### Options | ||
|
||
- `-m, --model`: Path to the model and tokenizers base directory. | ||
- `-p, --prompt` (default: `"The Sky is blue because"`): The prompt to generate text. | ||
- `-nw, --num_warmup` (default: `1`): Number of warmup iterations. | ||
- `-n, --num_iter` (default: `3`): Number of iterations. | ||
- `-mt, --max_new_tokens` (default: `20`): Number of warmup iterations. | ||
- `-d, --device` (default: `"CPU"`): Device to run the model on. | ||
|
||
### Output: | ||
|
||
``` | ||
python benchmark_vanilla_genai.py -m TinyLlama-1.1B-Chat-v1.0 -n 10 | ||
ilya-lavrenov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
``` | ||
Load time: 3405.69 ms | ||
Generate time: 1430.77 ± 3.04 ms | ||
Tokenization time: 0.51 ± 0.02 ms | ||
Detokenization time: 0.37 ± 0.01 ms | ||
TTFT: 81.60 ± 0.54 ms | ||
TPOT: 71.52 ± 2.72 ms | ||
Throughput tokens/s: 13.98 ± 0.53 | ||
``` | ||
|
||
For more information on how performance metrics are calculated, see [performance metrics readme](../../../src/README.md#performance-metrics). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Copyright (C) 2023-2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
import argparse | ||
import openvino_genai as ov_genai | ||
|
||
def main(): | ||
parser = argparse.ArgumentParser(description="Help command") | ||
parser.add_argument("-m", "--model", type=str, help="Path to model and tokenizers base directory") | ||
parser.add_argument("-p", "--prompt", type=str, default="The Sky is blue because", help="Prompt") | ||
parser.add_argument("-nw", "--num_warmup", type=int, default=1, help="Number of warmup iterations") | ||
parser.add_argument("-n", "--num_iter", type=int, default=2, help="Number of iterations") | ||
parser.add_argument("-mt", "--max_new_tokens", type=int, default=20, help="Maximal number of new tokens") | ||
parser.add_argument("-d", "--device", type=str, default="CPU", help="Device") | ||
|
||
args = parser.parse_args() | ||
|
||
# Perf metrics is stored in DecodedResults. | ||
# In order to get DecodedResults instead of a string input should be a list. | ||
prompt = [args.prompt] | ||
model_path = args.model | ||
device = args.device | ||
num_warmup = args.num_warmup | ||
num_iter = args.num_iter | ||
|
||
config = ov_genai.GenerationConfig() | ||
config.max_new_tokens = args.max_new_tokens | ||
|
||
pipe = ov_genai.LLMPipeline(model_path, device) | ||
|
||
for _ in range(num_warmup): | ||
pipe.generate(prompt, config) | ||
|
||
res = pipe.generate(prompt, config) | ||
perf_metrics = res.perf_metrics | ||
for _ in range(num_iter - 1): | ||
res = pipe.generate(prompt, config) | ||
perf_metrics += res.perf_metrics | ||
|
||
print(f"Load time: {perf_metrics.get_load_time():.2f} ms") | ||
print(f"Generate time: {perf_metrics.get_generate_duration().mean:.2f} ± {perf_metrics.get_generate_duration().std:.2f} ms") | ||
print(f"Tokenization time: {perf_metrics.get_tokenization_duration().mean:.2f} ± {perf_metrics.get_tokenization_duration().std:.2f} ms") | ||
print(f"Detokenization time: {perf_metrics.get_detokenization_duration().mean:.2f} ± {perf_metrics.get_detokenization_duration().std:.2f} ms") | ||
print(f"TTFT: {perf_metrics.get_ttft().mean:.2f} ± {perf_metrics.get_ttft().std:.2f} ms") | ||
print(f"TPOT: {perf_metrics.get_tpot().mean:.2f} ± {perf_metrics.get_tpot().std:.2f} ms") | ||
print(f"Throughput : {perf_metrics.get_throughput().mean:.2f} ± {perf_metrics.get_throughput().std:.2f} tokens/s") | ||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have CI runs for these new samples? I don't see it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It was last hour merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a task for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created 148650