From 8601a67dbde1ca040ba4746e9e05bc4b3667115c Mon Sep 17 00:00:00 2001
From: Mwiza <43536864+kundaMwiza@users.noreply.github.com>
Date: Wed, 24 May 2023 09:57:20 +0100
Subject: [PATCH] Deberta notebook fix: parallelize pipelined model (#43)

Parallelize must be called in order to add poptorch block annotations to
model layers. Without this the model will only run on 1 IPU.

---------

Co-authored-by: Alexandre Payot <18074599+payoto@users.noreply.github.com>
---
 .../deberta-blog-notebook.ipynb               | 21 +++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/natural-language-processing/other-use-cases/deberta-blog-notebook.ipynb b/natural-language-processing/other-use-cases/deberta-blog-notebook.ipynb
index d82d80f..c7eb101 100644
--- a/natural-language-processing/other-use-cases/deberta-blog-notebook.ipynb
+++ b/natural-language-processing/other-use-cases/deberta-blog-notebook.ipynb
@@ -1,6 +1,7 @@
 {
  "cells": [
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "2f37d919-8e25-4149-9f94-6aeebce8d2cd",
    "metadata": {},
@@ -34,14 +35,18 @@
     "oracle(question=\"Where do I live?\", context=\"My name is Wolfgang and I live in Berlin\")\n",
     "```\n",
     "\n",
-    "However in some cases such as MNLI, there is no off-the-shelf pipeline ready to use. In this case, you could simply instantiate the model, use the optimum-specific call `to_pipelined` to pipeline the model according to the `IPUConfig`, and prepare it for inference using `poptorch.inferenceModel()`.\n",
+    "However in some cases such as MNLI, there is no off-the-shelf pipeline ready to use. In this case, you could simply:\n",
+    "- Instantiate the model with the correct execution mode\n",
+    "- Use the optimum-specific call `to_pipelined` to return the model with changes and annotations for running on the IPU\n",
+    "- Set the model to run in `eval` mode and use the `parallelize` method on the new model to parallelize it across IPUs\n",
+    "- Prepare it for inference using `poptorch.inferenceModel()`\n",
     "\n",
     "```\n",
     "model = DebertaForQuestionAnswering.from_pretrained(\"Palak/microsoft_deberta-base_squad\")\n",
     "\n",
     "ipu_config = IPUConfig(ipus_per_replica=2, matmul_proportion=0.2, executable_cache_dir=\"./exe_cache\")\n",
-    "pipelined_model = to_pipelined(model, ipu_config)\n",
-    "pipelined_model = poptorch.inferenceModel(pipelined_model)\n",
+    "pipelined_model = to_pipelined(model, ipu_config).eval().parallelize()\n",
+    "pipelined_model = poptorch.inferenceModel(pipelined_model, options=ipu_config.to_options(for_inference=True))\n",
     "```\n",
     "\n",
     "This method is demoed in this notebook, as Huggingface do not natively support the MNLI inference task."
@@ -151,6 +156,14 @@
     "model.half()"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3bd484d3",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
   {
    "attachments": {},
    "cell_type": "markdown",
@@ -295,7 +308,7 @@
    "outputs": [],
    "source": [
     "ipu_config = IPUConfig(ipus_per_replica=2, matmul_proportion=0.2, executable_cache_dir=executable_cache_dir)\n",
-    "pipelined_model = to_pipelined(model, ipu_config).parallelize()\n",
+    "pipelined_model = to_pipelined(model, ipu_config).eval().parallelize()\n",
     "pipelined_model = poptorch.inferenceModel(pipelined_model, options=ipu_config.to_options(for_inference=True))"
    ]
   },