-
Notifications
You must be signed in to change notification settings - Fork 42
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: ZePan110 <[email protected]>
- Loading branch information
Showing
3 changed files
with
227 additions
and
227 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,115 +1,132 @@ | ||
# OPEA Benchmark Tool | ||
# StressCli | ||
|
||
This Tool provides a microservice benchmarking framework that uses YAML configurations to define test cases for different services. It executes these tests using `stresscli`, built on top of [locust](https://github.com/locustio/locust), a performance/load testing tool for HTTP and other protocols and logs the results for performance analysis and data visualization. | ||
This project includes benchmark toolset for AI workloads such as OPEA. | ||
|
||
## Features | ||
## stresscli.py | ||
|
||
- **Services load testing**: Simulates high concurrency levels to test services like LLM, reranking, ASR, E2E and more. | ||
- **YAML-based configuration**: Define test cases, service endpoints, and testing parameters in YAML. | ||
- **Service metrics collection**: Optionally collect service metrics for detailed performance analysis. | ||
- **Flexible testing**: Supports various test cases like chatqna, codegen, codetrans, faqgen, audioqna, and visualqna. | ||
- **Data analysis and visualization**: After tests are executed, results can be analyzed and visualized to gain insights into the performance and behavior of each service. Performance trends, bottlenecks, and other key metrics are highlighted for decision-making. | ||
`stresscli.py` is a command line tool for dumping test specs and performing load tests. | ||
|
||
## Table of Contents | ||
### Prerequirements | ||
|
||
- [Installation](#installation) | ||
- [Usage](#usage) | ||
- [Configuration](#configuration) | ||
- [Test Suite Configuration](#test-suite-configuration) | ||
- [Test Cases](#test-cases) | ||
This tool will use `kubectl` to collect Kubernetes cluster information. So you need to make sure `kubectl` is installed and | ||
|
||
This tool uses `locust` by default to do load test. You have to install `locust` to your machine simply by | ||
``` | ||
pip3 install locust | ||
``` | ||
For detail information of `locust`, go to [locust website](https://docs.locust.io/en/stable/installation.html). | ||
|
||
## Installation | ||
|
||
### Prerequisites | ||
### Installation | ||
|
||
- Python 3.x | ||
- Install the required Python packages: | ||
The recommended way to install and run stresscli is in a virtualenv with the latest version of Python 3 (at least Python 3.11). If Python is not installed, you can likely install it using your distribution's | ||
package manager, or see the [Python Download page](https://www.python.org/downloads/). | ||
|
||
```bash | ||
pip install -r ../../requirements.txt | ||
git clone https://github.com/opea-project/GenAIEval.git | ||
# go to project root | ||
cd GenAIEval/evals/benchmark/stresscli | ||
# create virtual env | ||
python3 -m venv stresscli_virtualenv | ||
source stresscli_virtualenv/bin/activate | ||
# install requirements | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## Usage | ||
### Usage | ||
|
||
1 Define the test cases and configurations in the benchmark.yaml file. | ||
``` | ||
./stresscli.py --help | ||
Usage: stresscli.py [OPTIONS] COMMAND [ARGS]... | ||
2 Temporarily increase the file descriptor limit before run test: | ||
StressCLI - A command line tool for stress testing OPEA workloads. | ||
```bash | ||
ulimit -n 100000 | ||
Options: | ||
--kubeconfig PATH Configuration file to Kubernetes | ||
--help Show this message and exit. | ||
Commands: | ||
dump Dump the test spec | ||
load-test Do load test | ||
report Print the test report | ||
validate Validate against the test spec | ||
``` | ||
#### Run a test | ||
|
||
This command increases the maximum number of file descriptors (which represent open files, network connections, etc.) that a single process can use. By default, many systems set a conservative limit, such as 1024, which may not be sufficient for high-concurrency applications or large-scale load testing. Raising this limit ensures that the process can handle a larger number of open connections or files without running into errors caused by insufficient file descriptors. | ||
**Note: Please edit the [run.yaml](./run.yaml) file or create your profile before run the load test.** | ||
|
||
3 Run the benchmark script: | ||
``` | ||
./stresscli.py load-test --profile run.yaml | ||
``` | ||
|
||
```bash | ||
python benchmark.py | ||
``` | ||
|
||
The results will be stored in the directory specified by `test_output_dir` in the configuration. | ||
|
||
|
||
## Configuration | ||
|
||
The benchmark.yaml file defines the test suite and individual test cases. Below are the primary sections: | ||
|
||
### Test Suite Configuration | ||
|
||
```yaml | ||
test_suite_config: | ||
examples: ["chatqna"] # Test cases to be run (e.g., chatqna, codegen) | ||
deployment_type: "k8s" # Default is "k8s", can also be "docker" | ||
service_ip: None # Leave as None for k8s, specify for Docker | ||
service_port: None # Leave as None for k8s, specify for Docker | ||
load_shape: # Tenant concurrency pattern | ||
name: constant # poisson or constant(locust default load shape) | ||
params: # Loadshape-specific parameters | ||
constant: # Poisson load shape specific parameters, activate only if load_shape is poisson | ||
concurrent_level: 4 # If user_queries is specified, concurrent_level is target number of requests per user. If not, it is the number of simulated users | ||
poisson: # Poisson load shape specific parameters, activate only if load_shape is poisson | ||
arrival-rate: 1.0 # Request arrival rate | ||
warm_ups: 0 # Number of test requests for warm-ups | ||
run_time: 60m # Total runtime for the test suite | ||
user_queries: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048] # Number of test requests | ||
query_timeout: 120 # Number of seconds to wait for a simulated user to complete any executing task before exiting. 120 sec by defeult. | ||
random_prompt: false # Use random prompts if true, fixed prompts if false | ||
collect_service_metric: false # Enable service metrics collection | ||
data_visualization: false # Enable data visualization | ||
test_output_dir: "/home/sdp/benchmark_output" # Directory for test outputs | ||
``` | ||
### Test Cases | ||
Each test case includes multiple services, each of which can be toggled on/off using the `run_test` flag. You can also change specific parameters for each service for performance tuning. | ||
|
||
Example test case configuration for `chatqna`: | ||
|
||
```yaml | ||
test_cases: | ||
chatqna: | ||
embedding: | ||
run_test: false | ||
service_name: "embedding-svc" | ||
retriever: | ||
run_test: false | ||
service_name: "retriever-svc" | ||
parameters: | ||
search_type: "similarity" | ||
k: 4 | ||
fetch_k: 20 | ||
lambda_mult: 0.5 | ||
score_threshold: 0.2 | ||
llm: | ||
run_test: false | ||
service_name: "llm-svc" | ||
parameters: | ||
model_name: "Intel/neural-chat-7b-v3-3" | ||
max_new_tokens: 128 | ||
temperature: 0.01 | ||
streaming: true | ||
e2e: | ||
run_test: true | ||
service_name: "chatqna-backend-server-svc" | ||
More detail options: | ||
``` | ||
./stresscli.py load-test --help | ||
Usage: stresscli.py load-test [OPTIONS] | ||
Do load test | ||
Options: | ||
--profile PATH Path to profile YAML file | ||
--help Show this message and exit. | ||
``` | ||
|
||
#### Generate the test output report | ||
|
||
You can generate the report for test cases by: | ||
``` | ||
./stresscli.py report --folder /home/sdp/test_reports/20240710_004105 --format csv -o data.csv | ||
``` | ||
|
||
More detail options: | ||
``` | ||
./stresscli.py report --help | ||
Usage: stresscli.py report [OPTIONS] | ||
Print the test report | ||
Options: | ||
--folder PATH Path to log folder [required] | ||
--format TEXT Output format, plain_text or csv, default is plain_text | ||
--include TEXT Extract output data from output.log, stats.csv, and | ||
testspec.yaml, default is | ||
output.log|stats.csv|testspec.yaml | ||
-o, --output PATH Save output to file | ||
--help Show this message and exit. | ||
``` | ||
#### Dump the configuration | ||
|
||
You can dump the current testing profile by | ||
``` | ||
./stresscli.py dump -o <output_file> | ||
``` | ||
More detail options: | ||
``` | ||
./stresscli.py dump --help | ||
Usage: stresscli.py dump [OPTIONS] | ||
Dump the test spec | ||
Options: | ||
-o, --output PATH YAML file of cluster configuration [required] | ||
--help Show this message and exit. | ||
``` | ||
|
||
#### Validate against the spec | ||
|
||
You can validate if the current K8s and workloads deployment comply with the test spec by: | ||
``` | ||
./stresscli.py validate --file testspec.yaml | ||
``` | ||
|
||
More detail options: | ||
``` | ||
./stresscli.py validate --help | ||
Usage: stresscli.py validate [OPTIONS] | ||
Validate against the test spec | ||
Options: | ||
--file PATH Specification YAML file to validate against [required] | ||
--validate_topology Validate topology in workload specification | ||
--help Show this message and exit. | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
# OPEA Benchmark Tool | ||
|
||
This Tool provides a microservice benchmarking framework that uses YAML configurations to define test cases for different services. It executes these tests using `stresscli`, built on top of [locust](https://github.com/locustio/locust), a performance/load testing tool for HTTP and other protocols and logs the results for performance analysis and data visualization. | ||
|
||
## Features | ||
|
||
- **Services load testing**: Simulates high concurrency levels to test services like LLM, reranking, ASR, E2E and more. | ||
- **YAML-based configuration**: Define test cases, service endpoints, and testing parameters in YAML. | ||
- **Service metrics collection**: Optionally collect service metrics for detailed performance analysis. | ||
- **Flexible testing**: Supports various test cases like chatqna, codegen, codetrans, faqgen, audioqna, and visualqna. | ||
- **Data analysis and visualization**: After tests are executed, results can be analyzed and visualized to gain insights into the performance and behavior of each service. Performance trends, bottlenecks, and other key metrics are highlighted for decision-making. | ||
|
||
## Table of Contents | ||
|
||
- [Installation](#installation) | ||
- [Usage](#usage) | ||
- [Configuration](#configuration) | ||
- [Test Suite Configuration](#test-suite-configuration) | ||
- [Test Cases](#test-cases) | ||
|
||
|
||
## Installation | ||
|
||
### Prerequisites | ||
|
||
- Python 3.x | ||
- Install the required Python packages: | ||
|
||
```bash | ||
pip install -r ../../requirements.txt | ||
``` | ||
|
||
## Usage | ||
|
||
1 Define the test cases and configurations in the benchmark.yaml file. | ||
|
||
2 Temporarily increase the file descriptor limit before run test: | ||
|
||
```bash | ||
ulimit -n 100000 | ||
``` | ||
|
||
This command increases the maximum number of file descriptors (which represent open files, network connections, etc.) that a single process can use. By default, many systems set a conservative limit, such as 1024, which may not be sufficient for high-concurrency applications or large-scale load testing. Raising this limit ensures that the process can handle a larger number of open connections or files without running into errors caused by insufficient file descriptors. | ||
|
||
3 Run the benchmark script: | ||
|
||
```bash | ||
python benchmark.py | ||
``` | ||
|
||
The results will be stored in the directory specified by `test_output_dir` in the configuration. | ||
|
||
|
||
## Configuration | ||
|
||
The benchmark.yaml file defines the test suite and individual test cases. Below are the primary sections: | ||
|
||
### Test Suite Configuration | ||
|
||
```yaml | ||
test_suite_config: | ||
examples: ["chatqna"] # Test cases to be run (e.g., chatqna, codegen) | ||
deployment_type: "k8s" # Default is "k8s", can also be "docker" | ||
service_ip: None # Leave as None for k8s, specify for Docker | ||
service_port: None # Leave as None for k8s, specify for Docker | ||
load_shape: # Tenant concurrency pattern | ||
name: constant # poisson or constant(locust default load shape) | ||
params: # Loadshape-specific parameters | ||
constant: # Poisson load shape specific parameters, activate only if load_shape is poisson | ||
concurrent_level: 4 # If user_queries is specified, concurrent_level is target number of requests per user. If not, it is the number of simulated users | ||
poisson: # Poisson load shape specific parameters, activate only if load_shape is poisson | ||
arrival-rate: 1.0 # Request arrival rate | ||
warm_ups: 0 # Number of test requests for warm-ups | ||
run_time: 60m # Total runtime for the test suite | ||
user_queries: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048] # Number of test requests | ||
query_timeout: 120 # Number of seconds to wait for a simulated user to complete any executing task before exiting. 120 sec by defeult. | ||
random_prompt: false # Use random prompts if true, fixed prompts if false | ||
collect_service_metric: false # Enable service metrics collection | ||
data_visualization: false # Enable data visualization | ||
test_output_dir: "/home/sdp/benchmark_output" # Directory for test outputs | ||
``` | ||
### Test Cases | ||
Each test case includes multiple services, each of which can be toggled on/off using the `run_test` flag. You can also change specific parameters for each service for performance tuning. | ||
|
||
Example test case configuration for `chatqna`: | ||
|
||
```yaml | ||
test_cases: | ||
chatqna: | ||
embedding: | ||
run_test: false | ||
service_name: "embedding-svc" | ||
retriever: | ||
run_test: false | ||
service_name: "retriever-svc" | ||
parameters: | ||
search_type: "similarity" | ||
k: 4 | ||
fetch_k: 20 | ||
lambda_mult: 0.5 | ||
score_threshold: 0.2 | ||
llm: | ||
run_test: false | ||
service_name: "llm-svc" | ||
parameters: | ||
model_name: "Intel/neural-chat-7b-v3-3" | ||
max_new_tokens: 128 | ||
temperature: 0.01 | ||
streaming: true | ||
e2e: | ||
run_test: true | ||
service_name: "chatqna-backend-server-svc" | ||
``` |
Oops, something went wrong.