Skip to content

Commit

Permalink
test
Browse files Browse the repository at this point in the history
Signed-off-by: ZePan110 <[email protected]>
  • Loading branch information
ZePan110 committed Sep 27, 2024
1 parent 99cefe2 commit c489bf0
Show file tree
Hide file tree
Showing 3 changed files with 227 additions and 227 deletions.
207 changes: 112 additions & 95 deletions evals/README.md
Original file line number Diff line number Diff line change
@@ -1,115 +1,132 @@
# OPEA Benchmark Tool
# StressCli

This Tool provides a microservice benchmarking framework that uses YAML configurations to define test cases for different services. It executes these tests using `stresscli`, built on top of [locust](https://github.com/locustio/locust), a performance/load testing tool for HTTP and other protocols and logs the results for performance analysis and data visualization.
This project includes benchmark toolset for AI workloads such as OPEA.

## Features
## stresscli.py

- **Services load testing**: Simulates high concurrency levels to test services like LLM, reranking, ASR, E2E and more.
- **YAML-based configuration**: Define test cases, service endpoints, and testing parameters in YAML.
- **Service metrics collection**: Optionally collect service metrics for detailed performance analysis.
- **Flexible testing**: Supports various test cases like chatqna, codegen, codetrans, faqgen, audioqna, and visualqna.
- **Data analysis and visualization**: After tests are executed, results can be analyzed and visualized to gain insights into the performance and behavior of each service. Performance trends, bottlenecks, and other key metrics are highlighted for decision-making.
`stresscli.py` is a command line tool for dumping test specs and performing load tests.

## Table of Contents
### Prerequirements

- [Installation](#installation)
- [Usage](#usage)
- [Configuration](#configuration)
- [Test Suite Configuration](#test-suite-configuration)
- [Test Cases](#test-cases)
This tool will use `kubectl` to collect Kubernetes cluster information. So you need to make sure `kubectl` is installed and

This tool uses `locust` by default to do load test. You have to install `locust` to your machine simply by
```
pip3 install locust
```
For detail information of `locust`, go to [locust website](https://docs.locust.io/en/stable/installation.html).

## Installation

### Prerequisites
### Installation

- Python 3.x
- Install the required Python packages:
The recommended way to install and run stresscli is in a virtualenv with the latest version of Python 3 (at least Python 3.11). If Python is not installed, you can likely install it using your distribution's
package manager, or see the [Python Download page](https://www.python.org/downloads/).

```bash
pip install -r ../../requirements.txt
git clone https://github.com/opea-project/GenAIEval.git
# go to project root
cd GenAIEval/evals/benchmark/stresscli
# create virtual env
python3 -m venv stresscli_virtualenv
source stresscli_virtualenv/bin/activate
# install requirements
pip install -r requirements.txt
```

## Usage
### Usage

1 Define the test cases and configurations in the benchmark.yaml file.
```
./stresscli.py --help
Usage: stresscli.py [OPTIONS] COMMAND [ARGS]...
2 Temporarily increase the file descriptor limit before run test:
StressCLI - A command line tool for stress testing OPEA workloads.
```bash
ulimit -n 100000
Options:
--kubeconfig PATH Configuration file to Kubernetes
--help Show this message and exit.
Commands:
dump Dump the test spec
load-test Do load test
report Print the test report
validate Validate against the test spec
```
#### Run a test

This command increases the maximum number of file descriptors (which represent open files, network connections, etc.) that a single process can use. By default, many systems set a conservative limit, such as 1024, which may not be sufficient for high-concurrency applications or large-scale load testing. Raising this limit ensures that the process can handle a larger number of open connections or files without running into errors caused by insufficient file descriptors.
**Note: Please edit the [run.yaml](./run.yaml) file or create your profile before run the load test.**

3 Run the benchmark script:
```
./stresscli.py load-test --profile run.yaml
```

```bash
python benchmark.py
```

The results will be stored in the directory specified by `test_output_dir` in the configuration.


## Configuration

The benchmark.yaml file defines the test suite and individual test cases. Below are the primary sections:

### Test Suite Configuration

```yaml
test_suite_config:
examples: ["chatqna"] # Test cases to be run (e.g., chatqna, codegen)
deployment_type: "k8s" # Default is "k8s", can also be "docker"
service_ip: None # Leave as None for k8s, specify for Docker
service_port: None # Leave as None for k8s, specify for Docker
load_shape: # Tenant concurrency pattern
name: constant # poisson or constant(locust default load shape)
params: # Loadshape-specific parameters
constant: # Poisson load shape specific parameters, activate only if load_shape is poisson
concurrent_level: 4 # If user_queries is specified, concurrent_level is target number of requests per user. If not, it is the number of simulated users
poisson: # Poisson load shape specific parameters, activate only if load_shape is poisson
arrival-rate: 1.0 # Request arrival rate
warm_ups: 0 # Number of test requests for warm-ups
run_time: 60m # Total runtime for the test suite
user_queries: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048] # Number of test requests
query_timeout: 120 # Number of seconds to wait for a simulated user to complete any executing task before exiting. 120 sec by defeult.
random_prompt: false # Use random prompts if true, fixed prompts if false
collect_service_metric: false # Enable service metrics collection
data_visualization: false # Enable data visualization
test_output_dir: "/home/sdp/benchmark_output" # Directory for test outputs
```
### Test Cases
Each test case includes multiple services, each of which can be toggled on/off using the `run_test` flag. You can also change specific parameters for each service for performance tuning.

Example test case configuration for `chatqna`:

```yaml
test_cases:
chatqna:
embedding:
run_test: false
service_name: "embedding-svc"
retriever:
run_test: false
service_name: "retriever-svc"
parameters:
search_type: "similarity"
k: 4
fetch_k: 20
lambda_mult: 0.5
score_threshold: 0.2
llm:
run_test: false
service_name: "llm-svc"
parameters:
model_name: "Intel/neural-chat-7b-v3-3"
max_new_tokens: 128
temperature: 0.01
streaming: true
e2e:
run_test: true
service_name: "chatqna-backend-server-svc"
More detail options:
```
./stresscli.py load-test --help
Usage: stresscli.py load-test [OPTIONS]
Do load test
Options:
--profile PATH Path to profile YAML file
--help Show this message and exit.
```

#### Generate the test output report

You can generate the report for test cases by:
```
./stresscli.py report --folder /home/sdp/test_reports/20240710_004105 --format csv -o data.csv
```

More detail options:
```
./stresscli.py report --help
Usage: stresscli.py report [OPTIONS]
Print the test report
Options:
--folder PATH Path to log folder [required]
--format TEXT Output format, plain_text or csv, default is plain_text
--include TEXT Extract output data from output.log, stats.csv, and
testspec.yaml, default is
output.log|stats.csv|testspec.yaml
-o, --output PATH Save output to file
--help Show this message and exit.
```
#### Dump the configuration

You can dump the current testing profile by
```
./stresscli.py dump -o <output_file>
```
More detail options:
```
./stresscli.py dump --help
Usage: stresscli.py dump [OPTIONS]
Dump the test spec
Options:
-o, --output PATH YAML file of cluster configuration [required]
--help Show this message and exit.
```

#### Validate against the spec

You can validate if the current K8s and workloads deployment comply with the test spec by:
```
./stresscli.py validate --file testspec.yaml
```

More detail options:
```
./stresscli.py validate --help
Usage: stresscli.py validate [OPTIONS]
Validate against the test spec
Options:
--file PATH Specification YAML file to validate against [required]
--validate_topology Validate topology in workload specification
--help Show this message and exit.
```
115 changes: 115 additions & 0 deletions evals/benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# OPEA Benchmark Tool

This Tool provides a microservice benchmarking framework that uses YAML configurations to define test cases for different services. It executes these tests using `stresscli`, built on top of [locust](https://github.com/locustio/locust), a performance/load testing tool for HTTP and other protocols and logs the results for performance analysis and data visualization.

## Features

- **Services load testing**: Simulates high concurrency levels to test services like LLM, reranking, ASR, E2E and more.
- **YAML-based configuration**: Define test cases, service endpoints, and testing parameters in YAML.
- **Service metrics collection**: Optionally collect service metrics for detailed performance analysis.
- **Flexible testing**: Supports various test cases like chatqna, codegen, codetrans, faqgen, audioqna, and visualqna.
- **Data analysis and visualization**: After tests are executed, results can be analyzed and visualized to gain insights into the performance and behavior of each service. Performance trends, bottlenecks, and other key metrics are highlighted for decision-making.

## Table of Contents

- [Installation](#installation)
- [Usage](#usage)
- [Configuration](#configuration)
- [Test Suite Configuration](#test-suite-configuration)
- [Test Cases](#test-cases)


## Installation

### Prerequisites

- Python 3.x
- Install the required Python packages:

```bash
pip install -r ../../requirements.txt
```

## Usage

1 Define the test cases and configurations in the benchmark.yaml file.

2 Temporarily increase the file descriptor limit before run test:

```bash
ulimit -n 100000
```

This command increases the maximum number of file descriptors (which represent open files, network connections, etc.) that a single process can use. By default, many systems set a conservative limit, such as 1024, which may not be sufficient for high-concurrency applications or large-scale load testing. Raising this limit ensures that the process can handle a larger number of open connections or files without running into errors caused by insufficient file descriptors.

3 Run the benchmark script:

```bash
python benchmark.py
```

The results will be stored in the directory specified by `test_output_dir` in the configuration.


## Configuration

The benchmark.yaml file defines the test suite and individual test cases. Below are the primary sections:

### Test Suite Configuration

```yaml
test_suite_config:
examples: ["chatqna"] # Test cases to be run (e.g., chatqna, codegen)
deployment_type: "k8s" # Default is "k8s", can also be "docker"
service_ip: None # Leave as None for k8s, specify for Docker
service_port: None # Leave as None for k8s, specify for Docker
load_shape: # Tenant concurrency pattern
name: constant # poisson or constant(locust default load shape)
params: # Loadshape-specific parameters
constant: # Poisson load shape specific parameters, activate only if load_shape is poisson
concurrent_level: 4 # If user_queries is specified, concurrent_level is target number of requests per user. If not, it is the number of simulated users
poisson: # Poisson load shape specific parameters, activate only if load_shape is poisson
arrival-rate: 1.0 # Request arrival rate
warm_ups: 0 # Number of test requests for warm-ups
run_time: 60m # Total runtime for the test suite
user_queries: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048] # Number of test requests
query_timeout: 120 # Number of seconds to wait for a simulated user to complete any executing task before exiting. 120 sec by defeult.
random_prompt: false # Use random prompts if true, fixed prompts if false
collect_service_metric: false # Enable service metrics collection
data_visualization: false # Enable data visualization
test_output_dir: "/home/sdp/benchmark_output" # Directory for test outputs
```
### Test Cases
Each test case includes multiple services, each of which can be toggled on/off using the `run_test` flag. You can also change specific parameters for each service for performance tuning.

Example test case configuration for `chatqna`:

```yaml
test_cases:
chatqna:
embedding:
run_test: false
service_name: "embedding-svc"
retriever:
run_test: false
service_name: "retriever-svc"
parameters:
search_type: "similarity"
k: 4
fetch_k: 20
lambda_mult: 0.5
score_threshold: 0.2
llm:
run_test: false
service_name: "llm-svc"
parameters:
model_name: "Intel/neural-chat-7b-v3-3"
max_new_tokens: 128
temperature: 0.01
streaming: true
e2e:
run_test: true
service_name: "chatqna-backend-server-svc"
```
Loading

0 comments on commit c489bf0

Please sign in to comment.