test

Signed-off-by: ZePan110 <[email protected]>
opea-project · Sep 27, 2024 · c489bf0 · c489bf0
1 parent 99cefe2
commit c489bf0
Show file tree

Hide file tree

Showing 3 changed files with 227 additions and 227 deletions.
diff --git a/evals/README.md b/evals/README.md
@@ -1,115 +1,132 @@
-# OPEA Benchmark Tool
+# StressCli
 
-This Tool provides a microservice benchmarking framework that uses YAML configurations to define test cases for different services. It executes these tests using `stresscli`, built on top of [locust](https://github.com/locustio/locust), a performance/load testing tool for HTTP and other protocols and logs the results for performance analysis and data visualization.
+This project includes benchmark toolset for AI workloads such as OPEA.
 
-## Features
+## stresscli.py
 
-- **Services load testing**: Simulates high concurrency levels to test services like LLM, reranking, ASR, E2E and more.
-- **YAML-based configuration**: Define test cases, service endpoints, and testing parameters in YAML.
-- **Service metrics collection**: Optionally collect service metrics for detailed performance analysis.
-- **Flexible testing**: Supports various test cases like chatqna, codegen, codetrans, faqgen, audioqna, and visualqna.
-- **Data analysis and visualization**: After tests are executed, results can be analyzed and visualized to gain insights into the performance and behavior of each service. Performance trends, bottlenecks, and other key metrics are highlighted for decision-making.
+`stresscli.py` is a command line tool for dumping test specs and performing load tests.
 
-## Table of Contents
+### Prerequirements
 
-- [Installation](#installation)
-- [Usage](#usage)
-- [Configuration](#configuration)
-  - [Test Suite Configuration](#test-suite-configuration)
-  - [Test Cases](#test-cases)
+This tool will use `kubectl` to collect Kubernetes cluster information. So you need to make sure `kubectl` is installed and 
 
+This tool uses `locust` by default to do load test. You have to install `locust` to your machine simply by
+```
+pip3 install locust
+```
+For detail information of `locust`, go to [locust website](https://docs.locust.io/en/stable/installation.html).
 
-## Installation
-
-### Prerequisites
+### Installation
 
-- Python 3.x
-- Install the required Python packages:
+The recommended way to install and run stresscli is in a virtualenv with the latest version of Python 3 (at least Python 3.11). If Python is not installed, you can likely install it using your distribution's
+package manager, or see the [Python Download page](https://www.python.org/downloads/).
 
 ```bash
-pip install -r ../../requirements.txt
+git clone https://github.com/opea-project/GenAIEval.git
+# go to project root
+cd GenAIEval/evals/benchmark/stresscli
+# create virtual env
+python3 -m venv stresscli_virtualenv
+source stresscli_virtualenv/bin/activate
+# install requirements
+pip install -r requirements.txt
 ```
 
-## Usage
+### Usage
 
-1 Define the test cases and configurations in the benchmark.yaml file.
+```
+./stresscli.py --help
+Usage: stresscli.py [OPTIONS] COMMAND [ARGS]...
 
-2 Temporarily increase the file descriptor limit before run test:
+  StressCLI - A command line tool for stress testing OPEA workloads.
 
-```bash
-ulimit -n 100000
+Options:
+  --kubeconfig PATH  Configuration file to Kubernetes
+  --help             Show this message and exit.
+
+Commands:
+  dump       Dump the test spec
+  load-test  Do load test
+  report     Print the test report
+  validate   Validate against the test spec
 ```
+#### Run a test
 
-This command increases the maximum number of file descriptors (which represent open files, network connections, etc.) that a single process can use. By default, many systems set a conservative limit, such as 1024, which may not be sufficient for high-concurrency applications or large-scale load testing. Raising this limit ensures that the process can handle a larger number of open connections or files without running into errors caused by insufficient file descriptors.
+**Note: Please edit the [run.yaml](./run.yaml) file or create your profile before run the load test.**
 
-3 Run the benchmark script:
+```
+./stresscli.py load-test --profile run.yaml
+```
 
-```bash
-python benchmark.py
-```
-
-The results will be stored in the directory specified by `test_output_dir` in the configuration.
-
-
-## Configuration
-
-The benchmark.yaml file defines the test suite and individual test cases. Below are the primary sections:
-
-### Test Suite Configuration
-
-```yaml
-test_suite_config: 
-  examples: ["chatqna"]  # Test cases to be run (e.g., chatqna, codegen)
-  deployment_type: "k8s"  # Default is "k8s", can also be "docker"
-  service_ip: None  # Leave as None for k8s, specify for Docker
-  service_port: None  # Leave as None for k8s, specify for Docker
-  load_shape:              # Tenant concurrency pattern
-    name: constant           # poisson or constant(locust default load shape)
-    params:                  # Loadshape-specific parameters
-      constant:                # Poisson load shape specific parameters, activate only if load_shape is poisson
-        concurrent_level: 4      # If user_queries is specified, concurrent_level is target number of requests per user. If not, it is the number of simulated users
-      poisson:                 # Poisson load shape specific parameters, activate only if load_shape is poisson
-        arrival-rate: 1.0        # Request arrival rate
-  warm_ups: 0  # Number of test requests for warm-ups
-  run_time: 60m  # Total runtime for the test suite
-  user_queries: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048]  # Number of test requests
-  query_timeout: 120  # Number of seconds to wait for a simulated user to complete any executing task before exiting. 120 sec by defeult.
-  random_prompt: false  # Use random prompts if true, fixed prompts if false
-  collect_service_metric: false  # Enable service metrics collection
-  data_visualization: false # Enable data visualization
-  test_output_dir: "/home/sdp/benchmark_output"  # Directory for test outputs
-```
-
-### Test Cases
-
-Each test case includes multiple services, each of which can be toggled on/off using the `run_test` flag. You can also change specific parameters for each service for performance tuning.
-
-Example test case configuration for `chatqna`:
-
-```yaml
-test_cases:
-  chatqna:
-    embedding:
-      run_test: false
-      service_name: "embedding-svc"
-    retriever:
-      run_test: false
-      service_name: "retriever-svc"
-      parameters:
-        search_type: "similarity"
-        k: 4
-        fetch_k: 20
-        lambda_mult: 0.5
-        score_threshold: 0.2
-    llm:
-      run_test: false
-      service_name: "llm-svc"
-      parameters:
-        model_name: "Intel/neural-chat-7b-v3-3"
-        max_new_tokens: 128
-        temperature: 0.01
-        streaming: true
-    e2e:
-      run_test: true
-      service_name: "chatqna-backend-server-svc"
+More detail options:
+```
+./stresscli.py load-test --help
+Usage: stresscli.py load-test [OPTIONS]
+
+  Do load test
+
+Options:
+  --profile PATH  Path to profile YAML file
+  --help          Show this message and exit.
+```
+
+#### Generate the test output report
+
+You can generate the report for test cases by:
+```
+./stresscli.py report --folder /home/sdp/test_reports/20240710_004105 --format csv -o data.csv
+```
+
+More detail options:
+```
+./stresscli.py report --help
+Usage: stresscli.py report [OPTIONS]
+
+  Print the test report
+
+Options:
+  --folder PATH      Path to log folder  [required]
+  --format TEXT      Output format, plain_text or csv, default is plain_text
+  --include TEXT     Extract output data from output.log, stats.csv, and
+                     testspec.yaml, default is
+                     output.log|stats.csv|testspec.yaml
+  -o, --output PATH  Save output to file
+  --help             Show this message and exit.
+```
+#### Dump the configuration
+
+You can dump the current testing profile by
+```
+./stresscli.py dump -o <output_file>
+```
+More detail options:
+```
+./stresscli.py dump --help
+Usage: stresscli.py dump [OPTIONS]
+
+  Dump the test spec
+
+Options:
+  -o, --output PATH  YAML file of cluster configuration  [required]
+  --help             Show this message and exit.
+```
+
+#### Validate against the spec
+
+You can validate if the current K8s and workloads deployment comply with the test spec by:
+```
+./stresscli.py validate --file testspec.yaml
+```
+
+More detail options:
+```
+./stresscli.py validate --help
+Usage: stresscli.py validate [OPTIONS]
+
+  Validate against the test spec
+
+Options:
+  --file PATH          Specification YAML file to validate against  [required]
+  --validate_topology  Validate topology in workload specification
+  --help               Show this message and exit.
 ```
diff --git a/evals/benchmark/README.md b/evals/benchmark/README.md
@@ -0,0 +1,115 @@
+# OPEA Benchmark Tool
+
+This Tool provides a microservice benchmarking framework that uses YAML configurations to define test cases for different services. It executes these tests using `stresscli`, built on top of [locust](https://github.com/locustio/locust), a performance/load testing tool for HTTP and other protocols and logs the results for performance analysis and data visualization.
+
+## Features
+
+- **Services load testing**: Simulates high concurrency levels to test services like LLM, reranking, ASR, E2E and more.
+- **YAML-based configuration**: Define test cases, service endpoints, and testing parameters in YAML.
+- **Service metrics collection**: Optionally collect service metrics for detailed performance analysis.
+- **Flexible testing**: Supports various test cases like chatqna, codegen, codetrans, faqgen, audioqna, and visualqna.
+- **Data analysis and visualization**: After tests are executed, results can be analyzed and visualized to gain insights into the performance and behavior of each service. Performance trends, bottlenecks, and other key metrics are highlighted for decision-making.
+
+## Table of Contents
+
+- [Installation](#installation)
+- [Usage](#usage)
+- [Configuration](#configuration)
+  - [Test Suite Configuration](#test-suite-configuration)
+  - [Test Cases](#test-cases)
+
+
+## Installation
+
+### Prerequisites
+
+- Python 3.x
+- Install the required Python packages:
+
+```bash
+pip install -r ../../requirements.txt
+```
+
+## Usage
+
+1 Define the test cases and configurations in the benchmark.yaml file.
+
+2 Temporarily increase the file descriptor limit before run test:
+
+```bash
+ulimit -n 100000
+```
+
+This command increases the maximum number of file descriptors (which represent open files, network connections, etc.) that a single process can use. By default, many systems set a conservative limit, such as 1024, which may not be sufficient for high-concurrency applications or large-scale load testing. Raising this limit ensures that the process can handle a larger number of open connections or files without running into errors caused by insufficient file descriptors.
+
+3 Run the benchmark script:
+
+```bash
+python benchmark.py
+```
+
+The results will be stored in the directory specified by `test_output_dir` in the configuration.
+
+
+## Configuration
+
+The benchmark.yaml file defines the test suite and individual test cases. Below are the primary sections:
+
+### Test Suite Configuration
+
+```yaml
+test_suite_config: 
+  examples: ["chatqna"]  # Test cases to be run (e.g., chatqna, codegen)
+  deployment_type: "k8s"  # Default is "k8s", can also be "docker"
+  service_ip: None  # Leave as None for k8s, specify for Docker
+  service_port: None  # Leave as None for k8s, specify for Docker
+  load_shape:              # Tenant concurrency pattern
+    name: constant           # poisson or constant(locust default load shape)
+    params:                  # Loadshape-specific parameters
+      constant:                # Poisson load shape specific parameters, activate only if load_shape is poisson
+        concurrent_level: 4      # If user_queries is specified, concurrent_level is target number of requests per user. If not, it is the number of simulated users
+      poisson:                 # Poisson load shape specific parameters, activate only if load_shape is poisson
+        arrival-rate: 1.0        # Request arrival rate
+  warm_ups: 0  # Number of test requests for warm-ups
+  run_time: 60m  # Total runtime for the test suite
+  user_queries: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048]  # Number of test requests
+  query_timeout: 120  # Number of seconds to wait for a simulated user to complete any executing task before exiting. 120 sec by defeult.
+  random_prompt: false  # Use random prompts if true, fixed prompts if false
+  collect_service_metric: false  # Enable service metrics collection
+  data_visualization: false # Enable data visualization
+  test_output_dir: "/home/sdp/benchmark_output"  # Directory for test outputs
+```
+
+### Test Cases
+
+Each test case includes multiple services, each of which can be toggled on/off using the `run_test` flag. You can also change specific parameters for each service for performance tuning.
+
+Example test case configuration for `chatqna`:
+
+```yaml
+test_cases:
+  chatqna:
+    embedding:
+      run_test: false
+      service_name: "embedding-svc"
+    retriever:
+      run_test: false
+      service_name: "retriever-svc"
+      parameters:
+        search_type: "similarity"
+        k: 4
+        fetch_k: 20
+        lambda_mult: 0.5
+        score_threshold: 0.2
+    llm:
+      run_test: false
+      service_name: "llm-svc"
+      parameters:
+        model_name: "Intel/neural-chat-7b-v3-3"
+        max_new_tokens: 128
+        temperature: 0.01
+        streaming: true
+    e2e:
+      run_test: true
+      service_name: "chatqna-backend-server-svc"
+```