WIP: trying out different models

Aleph-Alpha · Apr 4, 2024 · 95033f3 · 95033f3
1 parent 88efa0a
commit 95033f3
Showing 1 changed file with 102 additions and 25 deletions.
diff --git a/src/examples/user_journey.ipynb b/src/examples/user_journey.ipynb
@@ -47,7 +47,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "To start off, we are only given a few anecdotal examples. Let's see how far we can get with these.\n"
+    "To start off, we are only given a few anecdotal examples.\n",
+    "Firstly, there are two e-mails, and secondly a number of potential departments to which they should be sent.\n",
+    "\n",
+    "Let's have a look.\n"
    ]
   },
   {
@@ -102,15 +105,23 @@
     "# instantiating the default task\n",
     "prompt_based_classify = PromptBasedClassify()\n",
     "\n",
+    "# building the input object for each example\n",
     "classify_inputs = [\n",
     "    ClassifyInput(chunk=TextChunk(example), labels=labels) for example in examples\n",
     "]\n",
     "\n",
-    "\n",
+    "# running the tasks concurrently\n",
     "outputs = prompt_based_classify.run_concurrently(classify_inputs, InMemoryTracer())\n",
     "outputs"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Hmm, we have some results, but they aren't really legible (yet)."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -128,6 +139,17 @@
     "[sorted(list(o.scores.items()), key=lambda i: i[1], reverse=True)[0] for o in outputs]"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It appears that the Finance Department can fix my laptop and the Comms people can reward free credits...\n",
+    "We probably have to do some finetuning of our classification approach.\n",
+    "\n",
+    "However, let's first make sure that this evidence is not anecdotal.\n",
+    "For this, we need to do some eval. Luckily, we have by now got access to a few more examples...\n"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -137,29 +159,21 @@
     "It appears that the Finance Department can fix my laptop and the Comms people can reward free credits...\n",
     "We probably have to do some finetuning of our classification approach.\n",
     "\n",
-    "    },\n",
-    "    {\n",
-    "        \"label\": \"Sales\",\n",
-    "        \"message\": \"Jonas, we have met each other at the event in Nürnberg, can we meet for a follow up in your Office in Heidelberg?\"\n",
     "\n",
-    "    },\n",
-    "    {\n",
-    "        \"label\": \"Security\",\n",
-    "        \"message\": \"Your hTTPs Certificate is not valid on your www.aleph-alpha.de\"\n",
-    "    },\n",
-    "    {\n",
-    "        \"label\": \"HR\",\n",
-    "        \"message\": \"I want to take a week off immediatly\"\n",
-    "    },\n",
-    "    {\n",
-    "        \"label\": \"HR\",\n",
-    "        \"message\": \"I want to take a sabbatical\"\n",
-    "    },\n",
-    "    {\n",
-    "        \"label\": \"HR\",\n",
-    "        \"message\": \"How can I work more, I want to work weekends, can I get paid overtime?\"\n",
-    "    }\n",
-    "]"
+    "with open(\"data/classify_examples.json\", \"r\") as file:\n",
+    "    labeled_examples: list[dict[str, str]] = json.load(file)\n",
+    "\n",
+    "labeled_examples"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The Intelligence layer offers support to run task evaluations.\n",
+    "\n",
+    "First, we have to create a dataset inside a repository.\n",
+    "There are different repositories (that persist datasets in different ways), but an `InMemoryDatasetRepository` will do for now.\n"
    ]
   },
   {
@@ -211,6 +225,13 @@
     "When a dataset is created, we generate a unique ID. We'll need it later."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "When a dataset is created, we generate a unique ID. We'll need it later."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -220,6 +241,13 @@
     "dataset_id"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that we have a dataset, let's actually run an evaluation on it!\n"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -264,7 +292,15 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "eval_overview = evaluator.evaluate_runs(run_overview.id)"
+    "run_overview = runner.run_dataset(dataset_id)\n",
+    "run_overview"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, let's evaluate this run."
    ]
   },
   {
@@ -429,6 +465,47 @@
     "Let's run the cleaned dataset using this task..."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The prompt used for the `PromptBasedClassify`-task looks as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(prompt_based_classify.instruction)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can probably improve this task by making the prompt more specific, like so:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adjusted_prompt = \"\"\"Identify the department that would be responsible for handling the given request.\n",
+    "Reply with only the department name.\"\"\"\n",
+    "prompt_adjusted_classify = PromptBasedClassify(instruction=adjusted_prompt)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's run the cleaned dataset using this task..."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,