Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Rename how-to implement incremental evaluation and make it more… #864

Merged
merged 1 commit into from
May 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
...

### New Features
- Add `how_to_implement_complete_incremental_evaluation_flow`
- Add `how_to_implement_incremental_evaluation`.

### Fixes
- The document index client now correctly URL-encodes document names in its queries.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ The how-tos are quick lookups about how to do things. Compared to the tutorials,
| [...retrieve data for analysis](./src/documentation/how_tos/how_to_retrieve_data_for_analysis.ipynb) | Retrieve experiment data in multiple different ways |
| [...implement a custom human evaluation](./src/documentation/how_tos/how_to_human_evaluation_via_argilla.ipynb) | Necessary steps to create an evaluation with humans as a judge via Argilla |
| [...implement elo evaluations](./src/documentation/how_tos/how_to_implement_elo_evaluations.ipynb) | Evaluate runs and create ELO ranking for them |
| [...implement complete incremental evaluation flow](./src/documentation/how_tos/how_to_implement_complete_incremental_evaluation_flow.ipynb) | Run complete incremental evaluation flow from runner to aggretation
| [...implement incremental evaluation](./src/documentation/how_tos/how_to_implement_incremental_evaluation.ipynb) | Implement and run an incremental evaluation
# Models

Currently, we support a bunch of models accessible via the Aleph Alpha API. Depending on your local setup, you may even have additional models available.
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from documentation.how_tos.example_data import (\n",
" DummyAggregationLogic,\n",
" DummyEvaluation,\n",
" DummyExample,\n",
" example_data,\n",
")\n",
"from intelligence_layer.evaluation import (\n",
" Aggregator,\n",
" Example,\n",
" IncrementalEvaluationLogic,\n",
" IncrementalEvaluator,\n",
" InMemoryAggregationRepository,\n",
" InMemoryEvaluationRepository,\n",
" SuccessfulExampleOutput,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How to implement incremental evaluation\n",
"This notebook outlines how to perform evaluations in an incremental fashion, i.e., adding additional runs to your existing evaluations without the need for recalculation.\n",
" \n",
"## Step-by-Step Guide\n",
"0. Run your tasks on the datasets on which you want to evaluate them (see [here](./how_to_run_a_task_on_a_dataset.ipynb))\n",
" - When evaluating multiple runs, all of them need the same data types \n",
"1. Initialize all necessary repositories and define your `IncrementalEvaluationLogic`; It is similar to a normal `EvaluationLogic` (see [here](./how_to_implement_a_simple_evaluation_and_aggregation_logic.ipynb)) but you additionally have to implement your own `do_incremental_evaluate` method\n",
"2. Initialize an `IncrementalEvaluator` with the repositories and your custom `IncrementalEvaluationLogic`\n",
"3. Call the `evaluate_runs` method of the `IncrementalEvaluator`\n",
"4. Aggregate your evaluations using the [standard aggregation](./how_to_aggregate_evaluations.ipynb) or using a [custom aggregation logic](./how_to_implement_a_simple_evaluation_and_aggregation_logic.ipynb)\n",
"\n",
"#### Steps for addition of new runs \n",
"5. Call the `evaluate_additional_runs` method of the `IncrementalEvaluator`:\n",
" - `run_ids`: Runs to be included in the evaluation results, including those that have been evaluated before\n",
" - `previous_evaluation_ids`: Runs **not** to be re-evaluated, depending on the specific implementation of the `do_incremental_evaluate` method\n",
"6. Aggregate all your `EvaluationOverview`s"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Step 0\n",
"examples = [\n",
" DummyExample(input=\"input1\", expected_output=\"expected_output1\", data=\"data1\")\n",
"]\n",
"my_example_data = example_data()\n",
"\n",
"dataset_repository = my_example_data.dataset_repository\n",
"run_repository = my_example_data.run_repository\n",
"\n",
"# Step 1\n",
"evaluation_repository = InMemoryEvaluationRepository()\n",
"aggregation_repository = InMemoryAggregationRepository()\n",
"\n",
"\n",
"class DummyIncrementalEvaluationLogic(\n",
" IncrementalEvaluationLogic[str, str, str, DummyEvaluation]\n",
"):\n",
" def do_incremental_evaluate(\n",
" self,\n",
" example: Example[str, str],\n",
" outputs: list[SuccessfulExampleOutput[str]],\n",
" already_evaluated_outputs: list[list[SuccessfulExampleOutput[str]]],\n",
" ) -> DummyEvaluation:\n",
" return DummyEvaluation(eval=\"DummyEvalResult\")\n",
"\n",
"\n",
"# Step 2\n",
"incremental_evaluator = IncrementalEvaluator(\n",
" dataset_repository,\n",
" run_repository,\n",
" evaluation_repository,\n",
" \"My incremental evaluation\",\n",
" DummyIncrementalEvaluationLogic(),\n",
")\n",
"\n",
"# Step 3\n",
"incremental_evaluator.evaluate_runs(my_example_data.run_overview_1.id)\n",
"\n",
"# Step 4\n",
"aggregation_logic = DummyAggregationLogic()\n",
"aggregator = Aggregator(\n",
" evaluation_repository, aggregation_repository, \"MyAggregator\", aggregation_logic\n",
")\n",
"aggregation_overview = aggregator.aggregate_evaluation(\n",
" *evaluation_repository.evaluation_overview_ids()\n",
")\n",
"print(aggregation_overview)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Addition of new task/run\n",
"# Step 5\n",
"run_ids = [my_example_data.run_overview_1.id, my_example_data.run_overview_1.id]\n",
"incremental_evaluator.evaluate_additional_runs(\n",
" *run_ids,\n",
" previous_evaluation_ids=evaluation_repository.evaluation_overview_ids(),\n",
")\n",
"\n",
"# Step 6\n",
"second_aggregation_overview = aggregator.aggregate_evaluation(\n",
" *evaluation_repository.evaluation_overview_ids()\n",
")\n",
"print(second_aggregation_overview)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}