forked from ggerganov/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
server: tests: refactor steps and vocabulary
- Loading branch information
Showing
2 changed files
with
138 additions
and
151 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,39 +1,58 @@ | ||
Feature: llama.cpp server | ||
|
||
Background: The server is started and ready to accept prompts | ||
When wait for the server to be started | ||
Then wait for the server to be healthy | ||
Background: Server startup | ||
Given a server listening on localhost:8080 with 2 slots | ||
Then the server is starting | ||
Then the server is healthy | ||
|
||
Scenario: Health endpoint | ||
Given an health liveness probe | ||
Then the server must be healthy | ||
Scenario: Health | ||
When the server is healthy | ||
Then the server is ready | ||
|
||
Scenario Outline: run a completion request | ||
Given a prompt <prompt> | ||
When we request a completion | ||
Then tokens are predicted | ||
Scenario Outline: Completion | ||
Given a <prompt> completion request with maximum <n_predict> tokens | ||
Then <predicted_n> tokens are predicted | ||
|
||
Examples: Prompts | ||
| prompt | | ||
| I believe | | ||
| Write a joke | | ||
| prompt | n_predict | predicted_n | | ||
| I believe the meaning of life is | 128 | 128 | | ||
| Write a joke about AI | 512 | 512 | | ||
|
||
Scenario Outline: run a completion on the OAI endpoint | ||
Scenario Outline: OAI Compatibility | ||
Given a system prompt <system_prompt> | ||
And a user prompt <user_prompt> | ||
And a model <model> | ||
When we request the oai completions endpoint | ||
Then the oai response contains completion tokens | ||
And a user prompt <user_prompt> | ||
And a model <model> | ||
And <max_tokens> max tokens to predict | ||
Given an OAI compatible chat completions request | ||
Then <predicted_n> tokens are predicted | ||
|
||
Examples: Prompts | ||
| model | system_prompt | user_prompt | | ||
| tinyllama-2 | You are ChatGPT. | Say hello | | ||
| tinyllama-2 | You are a coding assistant. | Write the fibonacci function in c++ | | ||
|
||
|
||
Scenario: Health endpoint during processing with concurrent requests | ||
Given 2 slow concurrent prompts | ||
Then wait for all slots processing | ||
Then the server is overloaded | ||
When wait for all slots idle | ||
Then all prompts must be predicted | ||
| model | system_prompt | user_prompt | max_tokens | predicted_n | | ||
| llama-2 | You are ChatGPT. | Say hello. | 64 | 64 | | ||
| codellama70b | You are a coding assistant. | Write the fibonacci function in c++. | 512 | 512 | | ||
|
||
Scenario: Multi users | ||
Given a prompt: | ||
""" | ||
Write a formal complaint email to Air France about my delayed | ||
baggage from my flight on Tuesday, January 17th, from Paris to Toulouse. Be verbose. | ||
""" | ||
And a prompt: | ||
""" | ||
Translate the following War & Peace chapter into Russian: WELL, PRINCE, | ||
Genoa and Lucca are now no more than private estates of the Bonaparte | ||
family. No, I warn you, that if you do not tell me we are at war, | ||
if you again allow yourself to palliate all the infamies and atrocities | ||
of this Antichrist (upon my word, I believe he is), I don’t know you | ||
in future, you are no longer my friend, no longer my faithful slave, | ||
as you say. There, how do you do, how do you do? I see I’m scaring you, | ||
sit down and talk to me.” These words were uttered in July 1805 by | ||
Anna Pavlovna Scherer, a distinguished lady of the court, | ||
and confidential maid-of-honour to the Empress Marya Fyodorovna. | ||
It was her greeting to Prince Vassily, a man high in rank | ||
and office, who was the first to arrive at her soirée. | ||
""" | ||
Given concurrent completion requests | ||
Then the server is busy | ||
Then the server is idle | ||
Then all prompts are predicted |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters