Skip to content

Commit

Permalink
Bump version to v0.3.0 and added some examples to readme (#7)
Browse files Browse the repository at this point in the history
  • Loading branch information
simedw authored Jul 20, 2023
1 parent 6c37a96 commit 0b9d133
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 1 deletion.
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ There are multiple ways to evaluate if the test functions prediction matches the
By default GPT-3 is used to compare the output. You can use `--evaluator` to use a different method

- `semantic`, checks semantic similarity using language models like GPT-3, GPT-3.5, or GPT-4 (`--model` parameter). Please note, for this evaluator, you need to set the `OPENAI_API_KEY` environment variable.
- `embedding`, uses cosine distance between embedded vectors. Please note, for this evaluator, you need to set the `OPENAI_API_KEY` environment variable.
- `string-match`, checks if the strings are matching (case insensitive)
- `interactive`, user manually accepts or fails tests in the terminal
- `web`, uses pywebio fora simple local web interface
Expand All @@ -125,6 +126,21 @@ To accelerate the evaluation process, BenchLLM uses a cache. If a (prediction, e
$ bench run examples --cache memory
```

When working on developing chains or training agent models, there may be instances where these models need to interact with external functions — for instance, querying a weather forecast or executing an SQL query. In such scenarios, BenchLLM facilitates the ability to mock these functions. This helps you make your tests more predictable and enables the discovery of unexpected function calls.

```yml
input: I live in London, can I expect rain today?
expected: ["no"]
calls:
- name: forecast.get_n_day_weather_forecast
returns: It's sunny in London.
arguments:
location: London
num_days: 1
```

In the example above, the function `get_n_day_weather_forecast` in the `forecast` module is mocked. In other words, every time this function is invoked, the model will receive `"It's sunny in London"`. BenchLLM also provides warnings if the function is invoked with argument values different from `get_n_day_weather_forecast(location=London, num_days=1)`. Please note, the provision of these argument parameters is optional.

### 🧮 Eval

While _bench run_ runs each test function and then evaluates their output, it can often be beneficial to separate these into two steps. For example, if you want a person to manually do the evaluation or if you want to try multiple evaluation methods on the same function.
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"

[tool.poetry]
name = "benchllm"
version = "0.2.0"
version = "0.3.0"
description = "Tool for testing LLMs"
homepage = "https://github.com/v7labs/benchllm"
authors = [ "Simon Edwardsson <[email protected]>", "Andrea Azzini <[email protected]>"]
Expand Down

0 comments on commit 0b9d133

Please sign in to comment.