Bump version to v0.3.0 and added some examples to readme (#7)

v7labs · Jul 20, 2023 · 0b9d133 · 0b9d133
1 parent 6c37a96
commit 0b9d133
Show file tree

Hide file tree

Showing 2 changed files with 17 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -105,6 +105,7 @@ There are multiple ways to evaluate if the test functions prediction matches the
 By default GPT-3 is used to compare the output. You can use `--evaluator` to use a different method
 
 - `semantic`, checks semantic similarity using language models like GPT-3, GPT-3.5, or GPT-4 (`--model` parameter). Please note, for this evaluator, you need to set the `OPENAI_API_KEY` environment variable.
+- `embedding`, uses cosine distance between embedded vectors. Please note, for this evaluator, you need to set the `OPENAI_API_KEY` environment variable.
 - `string-match`, checks if the strings are matching (case insensitive)
 - `interactive`, user manually accepts or fails tests in the terminal
 - `web`, uses pywebio fora simple local web interface
@@ -125,6 +126,21 @@ To accelerate the evaluation process, BenchLLM uses a cache. If a (prediction, e
 $ bench run examples --cache memory
 ```
 
+When working on developing chains or training agent models, there may be instances where these models need to interact with external functions — for instance, querying a weather forecast or executing an SQL query. In such scenarios, BenchLLM facilitates the ability to mock these functions. This helps you make your tests more predictable and enables the discovery of unexpected function calls.
+
+```yml
+input: I live in London, can I expect rain today?
+expected: ["no"]
+calls:
+  - name: forecast.get_n_day_weather_forecast
+    returns: It's sunny in London.
+    arguments:
+      location: London
+      num_days: 1
+```
+
+In the example above, the function `get_n_day_weather_forecast` in the `forecast` module is mocked. In other words, every time this function is invoked, the model will receive `"It's sunny in London"`. BenchLLM also provides warnings if the function is invoked with argument values different from `get_n_day_weather_forecast(location=London, num_days=1)`. Please note, the provision of these argument parameters is optional.
+
 ### 🧮 Eval
 
 While _bench run_ runs each test function and then evaluates their output, it can often be beneficial to separate these into two steps. For example, if you want a person to manually do the evaluation or if you want to try multiple evaluation methods on the same function.

diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
 
 [tool.poetry]
 name = "benchllm"
-version = "0.2.0"
+version = "0.3.0"
 description = "Tool for testing LLMs"
 homepage = "https://github.com/v7labs/benchllm"
 authors = [ "Simon Edwardsson <[email protected]>", "Andrea Azzini <[email protected]>"]