Deberta notebook fix: parallelize pipelined model (gradient-ai#43)

Parallelize must be called in order to add poptorch block annotations to model layers. Without this the model will only run on 1 IPU. --------- Co-authored-by: Alexandre Payot <[email protected]>
graphcore · May 24, 2023 · 8601a67 · 8601a67
1 parent c73e15d
commit 8601a67
Showing 1 changed file with 17 additions and 4 deletions.
diff --git a/natural-language-processing/other-use-cases/deberta-blog-notebook.ipynb b/natural-language-processing/other-use-cases/deberta-blog-notebook.ipynb
@@ -1,6 +1,7 @@
 {
  "cells": [
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "2f37d919-8e25-4149-9f94-6aeebce8d2cd",
    "metadata": {},
@@ -34,14 +35,18 @@
     "oracle(question=\"Where do I live?\", context=\"My name is Wolfgang and I live in Berlin\")\n",
     "```\n",
     "\n",
-    "However in some cases such as MNLI, there is no off-the-shelf pipeline ready to use. In this case, you could simply instantiate the model, use the optimum-specific call `to_pipelined` to pipeline the model according to the `IPUConfig`, and prepare it for inference using `poptorch.inferenceModel()`.\n",
+    "However in some cases such as MNLI, there is no off-the-shelf pipeline ready to use. In this case, you could simply:\n",
+    "- Instantiate the model with the correct execution mode\n",
+    "- Use the optimum-specific call `to_pipelined` to return the model with changes and annotations for running on the IPU\n",
+    "- Set the model to run in `eval` mode and use the `parallelize` method on the new model to parallelize it across IPUs\n",
+    "- Prepare it for inference using `poptorch.inferenceModel()`\n",
     "\n",
     "```\n",
     "model = DebertaForQuestionAnswering.from_pretrained(\"Palak/microsoft_deberta-base_squad\")\n",
     "\n",
     "ipu_config = IPUConfig(ipus_per_replica=2, matmul_proportion=0.2, executable_cache_dir=\"./exe_cache\")\n",
-    "pipelined_model = to_pipelined(model, ipu_config)\n",
-    "pipelined_model = poptorch.inferenceModel(pipelined_model)\n",
+    "pipelined_model = to_pipelined(model, ipu_config).eval().parallelize()\n",
+    "pipelined_model = poptorch.inferenceModel(pipelined_model, options=ipu_config.to_options(for_inference=True))\n",
     "```\n",
     "\n",
     "This method is demoed in this notebook, as Huggingface do not natively support the MNLI inference task."
@@ -151,6 +156,14 @@
     "model.half()"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3bd484d3",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
   {
    "attachments": {},
    "cell_type": "markdown",
@@ -295,7 +308,7 @@
    "outputs": [],
    "source": [
     "ipu_config = IPUConfig(ipus_per_replica=2, matmul_proportion=0.2, executable_cache_dir=executable_cache_dir)\n",
-    "pipelined_model = to_pipelined(model, ipu_config).parallelize()\n",
+    "pipelined_model = to_pipelined(model, ipu_config).eval().parallelize()\n",
     "pipelined_model = poptorch.inferenceModel(pipelined_model, options=ipu_config.to_options(for_inference=True))"
    ]
   },