Support customized prompts and max new tokens in chatqna e2e test (#170)

* Support customized prompts and max new tokens in chatqna e2e test. * Update default prompts to make input tokens=128 for neuralchat 7b * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update README for benchmark tool --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
opea-project · Oct 22, 2024 · 79a4ad3 · 79a4ad3
1 parent 84e077e
commit 79a4ad3
Show file tree

Hide file tree

Showing 6 changed files with 44 additions and 3 deletions.
diff --git a/evals/benchmark/README.md b/evals/benchmark/README.md
@@ -119,5 +119,7 @@ test_cases:
                      # activate if collect_service_metric is true
         - "chatqna-backend-server-svc"
       dataset: # Activate if random_prompt=true: leave blank = default dataset(WebQuestions) or sharegpt
+      prompts: # User-customized prompts, activate if random_prompt=false.
+      max_output: 128  # max number of output tokens
 ```
 If you'd like to use sharegpt dataset, please download the dataset according to the [guide](https://github.com/lm-sys/FastChat/issues/90#issuecomment-1493250773). Merge all downloaded data files into one file named sharegpt.json and put the file at `evals/benchmark/stresscli/dataset`.
diff --git a/evals/benchmark/benchmark.py b/evals/benchmark/benchmark.py
@@ -100,6 +100,8 @@ def create_run_yaml_content(service, base_url, bench_target, test_phase, num_que
                 "service-metric-collect": test_params["collect_service_metric"],
                 "service-list": service.get("service_list", []),
                 "dataset": service.get("dataset", "default"),
+                "prompts": service.get("prompts", None),
+                "max-output": service.get("max_output", 128),
                 "seed": test_params.get("seed", None),
                 "llm-model": test_params["llm_model"],
                 "deployment-type": test_params["deployment_type"],

diff --git a/evals/benchmark/benchmark.yaml b/evals/benchmark/benchmark.yaml
@@ -78,7 +78,9 @@ test_cases:
         - "reranking-dependency-svc"
         - "retriever-svc"
         - "vector-db"
-      dataset: # Activate if random_prompt=true: leave blank = default dataset(WebQuestions) or sharegpt
+      dataset:  # Activate if random_prompt=true: leave blank = default dataset(WebQuestions) or sharegpt
+      prompts: In an increasingly complex world where technology has rapidly advanced and evolved far beyond our wildest dreams, humanity now stands on the precipice of a revolutionary new era that is filled with endless possibilities, profound and significant changes, as well as intricate challenges that we must actively address. The year is now 2050, and artificial intelligence has seamlessly woven itself deeply and intricately into the very fabric of everyday life. Autonomous vehicles glide effortlessly and smoothly through the bustling, vibrant, and lively city streets, while drones swiftly and accurately deliver packages with pinpoint precision, making logistics and delivery systems more efficient, streamlined, and advanced than ever before in the entire history of humankind and technological development. Smart homes, equipped with cutting-edge advanced sensors and sophisticated algorithms, anticipate every possible need and requirement of their inhabitants, creating an environment of unparalleled convenience, exceptional comfort, and remarkable efficiency that enhances our daily lives. However, with these remarkable and groundbreaking advancements come a host of new challenges, uncertainties, and ethical dilemmas that society must confront, navigate, and address in a thoughtful and deliberate manner. As we carefully navigate through this brave new world filled with astonishing technological marvels, innovations, and breakthroughs, questions about the implications and consequences of AI technologies become increasingly pressing, relevant, and urgent for individuals and communities alike. Issues surrounding privacy—how our personal data is collected, securely stored, processed, and utilized—emerge alongside significant concerns about security in a rapidly evolving digital landscape where vulnerabilities can be easily and readily exploited by malicious actors, hackers, and cybercriminals. Moreover, philosophical inquiries regarding the very nature of consciousness itself rise prominently to the forefront of public discourse, debate, and discussion, inviting diverse perspectives, opinions, and ethical considerations from various stakeholders. In light of these profound developments and transformative changes that we are witnessing, I would like to gain a much deeper, broader, and more comprehensive understanding of what artificial intelligence truly is and what it encompasses in its entirety and complexity. Could you elaborate extensively, thoroughly, and comprehensively on its precise definition, its wide-ranging and expansive scope, as well as the myriad and diverse ways it significantly impacts our daily lives, personal experiences, and society as a whole in various dimensions and aspects? # User-customized prompts, activate if random_prompt=false.
+      max_output: 128  # max number of output tokens
 
   codegen:
     llm:

diff --git a/evals/benchmark/stresscli/commands/load_test.py b/evals/benchmark/stresscli/commands/load_test.py
@@ -33,6 +33,8 @@
     "namespace": "default",
     "load-shape": {"name": DEFAULT_LOADSHAPE},
     "dataset": "default",
+    "max-output": 128,
+    "prompts": "none",
     "seed": "none",
 }
 
@@ -134,6 +136,12 @@ def run_locust_test(kubeconfig, global_settings, run_settings, output_folder, in
     runspec["namespace"] = run_settings.get("namespace", global_settings.get("namespace", locust_defaults["namespace"]))
     runspec["dataset"] = run_settings.get("dataset", global_settings.get("dataset", locust_defaults["dataset"]))
     runspec["dataset"] = locust_defaults["dataset"] if runspec["dataset"] is None else runspec["dataset"]
+    runspec["prompts"] = run_settings.get("prompts", global_settings.get("prompts", locust_defaults["prompts"]))
+    runspec["prompts"] = locust_defaults["prompts"] if runspec["prompts"] is None else runspec["prompts"]
+    runspec["max_output"] = run_settings.get(
+        "max-output", global_settings.get("max-output", locust_defaults["max-output"])
+    )
+    runspec["max_output"] = locust_defaults["max-output"] if runspec["max_output"] is None else runspec["max_output"]
     runspec["seed"] = run_settings.get("seed", global_settings.get("seed", locust_defaults["seed"]))
     runspec["seed"] = locust_defaults["seed"] if runspec["seed"] is None else runspec["seed"]
     runspec["run_name"] = run_settings["name"]
@@ -220,6 +228,10 @@ def run_locust_test(kubeconfig, global_settings, run_settings, output_folder, in
         load_shape,
         "--dataset",
         runspec["dataset"],
+        "--prompts",
+        runspec["prompts"],
+        "--max-output",
+        str(runspec["max_output"]),
         "--seed",
         str(runspec["seed"]),
         "--processes",

diff --git a/evals/benchmark/stresscli/locust/aistress.py b/evals/benchmark/stresscli/locust/aistress.py
@@ -64,6 +64,16 @@ def _(parser):
         default="none",
         help="The seed for all RNGs",
     )
+    parser.add_argument(
+        "--prompts",
+        type=str,
+        env_var="OPEA_EVAL_PROMPTS",
+        default="In a world where technology has advanced beyond our wildest dreams, humanity stands on the brink of a new era. The year is 2050, and artificial intelligence has become an integral part of everyday life. Autonomous vehicles zip through the streets, drones deliver packages with pinpoint accuracy, and smart homes anticipate every need of their inhabitants. But with these advancements come new challenges and ethical dilemmas. As society grapples with the implications of AI, questions about privacy, security, and the nature of consciousness itself come to the forefront. Please answer me the question what is artificial intelligence.",
+        help="User-customized prompts",
+    )
+    parser.add_argument(
+        "--max-output", type=int, env_var="OPEA_EVAL_MAX_OUTPUT_TOKENS", default=128, help="Max number of output tokens"
+    )
 
 
 reqlist = []
@@ -203,13 +213,17 @@ def on_test_start(environment, **kwargs):
         console_logger.info(f"Benchmark target  : {environment.parsed_options.bench_target}\n")
         console_logger.info(f"Load shape        : {environment.parsed_options.load_shape}")
         console_logger.info(f"Dataset           : {environment.parsed_options.dataset}")
+        console_logger.info(f"Customized prompt : {environment.parsed_options.prompts}")
+        console_logger.info(f"Max output tokens : {environment.parsed_options.max_output}")
 
 
 @events.init.add_listener
 def on_locust_init(environment, **_kwargs):
     global bench_package
     os.environ["OPEA_EVAL_DATASET"] = environment.parsed_options.dataset
     os.environ["OPEA_EVAL_SEED"] = environment.parsed_options.seed
+    os.environ["OPEA_EVAL_PROMPTS"] = environment.parsed_options.prompts
+    os.environ["OPEA_EVAL_MAX_OUTPUT_TOKENS"] = str(environment.parsed_options.max_output)
     try:
         bench_package = __import__(environment.parsed_options.bench_target)
     except ImportError:

diff --git a/evals/benchmark/stresscli/locust/chatqnafixed.py b/evals/benchmark/stresscli/locust/chatqnafixed.py
@@ -1,17 +1,26 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0
 
+import os
+
 import tokenresponse as token
 
+opea_eval_prompts = os.environ["OPEA_EVAL_PROMPTS"]
+max_new_tokens = int(os.environ["OPEA_EVAL_MAX_OUTPUT_TOKENS"])
+
 
 def getUrl():
     return "/v1/chatqna"
 
 
 def getReqData():
+    global opea_eval_prompts
+    global max_new_tokens
+    if opea_eval_prompts == "none":
+        opea_eval_prompts = "In a world where technology has advanced beyond our wildest dreams, humanity stands on the brink of a new era. The year is 2050, and artificial intelligence has become an integral part of everyday life. Autonomous vehicles zip through the streets, drones deliver packages with pinpoint accuracy, and smart homes anticipate every need of their inhabitants. But with these advancements come new challenges and ethical dilemmas. As society grapples with the implications of AI, questions about privacy, security, and the nature of consciousness itself come to the forefront. Please answer me the question what is artificial intelligence."
     return {
-        "messages": "In a world where technology has advanced beyond our wildest dreams, humanity stands on the brink of a new era. The year is 2050, and artificial intelligence has become an integral part of everyday life. Autonomous vehicles zip through the streets, drones deliver packages with pinpoint accuracy, and smart homes anticipate every need of their inhabitants. But with these advancements come new challenges and ethical dilemmas. As society grapples with the implications of AI, questions about privacy, security, and the nature of consciousness itself come to the forefront. Please answer me about what is artificial intelligence.",
-        "max_tokens": 128,
+        "messages": opea_eval_prompts,
+        "max_tokens": max_new_tokens,
         "top_k": 1,
     }