Add deberta entry in deploy YAML (gradient-ai#36)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: @lukem-gc
graphcore · May 19, 2023 · 308a0c3 · 308a0c3
1 parent f38f647
commit 308a0c3
Show file tree

Hide file tree

Showing 3 changed files with 372 additions and 0 deletions.
diff --git a/.github/deployment-configs/deploy-deberta.yaml b/.github/deployment-configs/deploy-deberta.yaml
@@ -0,0 +1,23 @@
+_optimum_graphcore_repository: &_optimum_graphcore_repository
+    origin: https://github.com/huggingface/optimum-graphcore.git
+    ref: main
+
+_current_repo_in_github_actions: &_current_repo_in_github_actions
+    origin: notebooks/
+    ref: null
+
+deberta-lukem:
+    source:
+      paths:
+      - expression: '*'
+        path: notebooks/deberta-blog-notebook.ipynb
+        recursive: true
+      repository:
+        origin: https://github.com/huggingface/optimum-graphcore.git
+        prefix: notebooks/
+        ref: main
+    target:
+      renames: {}
+      repository:
+        <<: *_current_repo_in_github_actions
+        prefix: natural-language-processing/other-use-cases/
diff --git a/.gradient/notebook-tests.yaml b/.gradient/notebook-tests.yaml
@@ -102,6 +102,13 @@ nlp-external_model:
     file: external_model.ipynb
     timeout: 10000
 
+nlp-deberta-blog:
+  location: ../natural-language-processing/other-use-cases
+  generated: true
+  notebook:
+    file: deberta-blog-notebook.ipynb
+    timeout: 1000
+
 # Stable diffusion
 stable-diffusion-image_to_image:
   location: ../stable-diffusion

diff --git a/natural-language-processing/other-use-cases/deberta-blog-notebook.ipynb b/natural-language-processing/other-use-cases/deberta-blog-notebook.ipynb
@@ -0,0 +1,342 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "2f37d919-8e25-4149-9f94-6aeebce8d2cd",
+   "metadata": {},
+   "source": [
+    "# SQuAD and MNLI on IPUs using DeBERTa - Inference\n",
+    "\n",
+    "This notebook provides an implementation of two natural language understanding (NLU) tasks using small, efficient models: [Microsoft DeBERTa-base](https://arxiv.org/abs/2006.03654) for sequence classification and question answering. The notebook demonstrates how these models can achieve good performance on standard benchmarks while being relatively lightweight and easy to use. \n",
+    "\n",
+    "The two NLU tasks covered in this notebook are:\n",
+    "- Multi-Genre Natural Language Inference (MNLI) - a sentence-pair classification task\n",
+    "- Stanford Question Answering Dataset (SQuAD) - a question answering task\n",
+    "\n",
+    "Hardware requirements: The models show each DeBERTa Base model running on two IPUs. If correctly configured, these models could both be served simultaneously on an IPU POD4.\n",
+    "\n",
+    "[![Run on Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://ipu.dev/pNJdMj)  [![Join our Slack Community](https://img.shields.io/badge/Slack-Join%20Graphcore's%20Community-blue?style=flat-square&logo=slack)](https://www.graphcore.ai/join-community)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "afe52060",
+   "metadata": {},
+   "source": [
+    "##### Optimum Graphcore\n",
+    "The notebook also demonstrates [Optimum Graphcore](https://github.com/huggingface/optimum-graphcore). Optimum Graphcore is the interface between the Hugging Face Transformers library and [Graphcore IPUs](https://www.graphcore.ai/products/ipu). This notebook demonstrates a more explicit way of using Huggingface models with the IPU. This method is particularly useful when the task in question is not supported by the Huggingface pipelines API.\n",
+    "\n",
+    "The easiest way to run a Huggingface inference model would be to instantiate the pipeline as follows:\n",
+    "\n",
+    "```\n",
+    "oracle = pipeline(model=\"Palak/microsoft_deberta-base_squad\")\n",
+    "oracle(question=\"Where do I live?\", context=\"My name is Wolfgang and I live in Berlin\")\n",
+    "```\n",
+    "\n",
+    "However in some cases such as MNLI, there is no off-the-shelf pipeline ready to use. In this case, you could simply instantiate the model, use the optimum-specific call `to_pipelined` to pipeline the model according to the `IPUConfig`, and prepare it for inference using `poptorch.inferenceModel()`.\n",
+    "\n",
+    "```\n",
+    "model = DebertaForQuestionAnswering.from_pretrained(\"Palak/microsoft_deberta-base_squad\")\n",
+    "\n",
+    "ipu_config = IPUConfig(ipus_per_replica=2, matmul_proportion=0.2, executable_cache_dir=\"./exe_cache\")\n",
+    "pipelined_model = to_pipelined(model, ipu_config)\n",
+    "pipelined_model = poptorch.inferenceModel(pipelined_model)\n",
+    "```\n",
+    "\n",
+    "This method is demoed in this notebook, as Huggingface do not natively support the MNLI inference task."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "6227fd68-3108-4ac2-9ef2-b1fbbe069d74",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "Install the optimum library"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6d132efc-0d4a-4647-af51-f4bde32eeeb7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install \"optimum-graphcore>=0.6, <0.7\""
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "14ed09d3",
+   "metadata": {},
+   "source": [
+    "We read some configuration from the environment to support environments like Paperspace Gradient."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "05ca39a7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "executable_cache_dir = os.getenv(\"POPLAR_EXECUTABLE_CACHE_DIR\", \"./exe_cache\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "e014263b-9a6e-4c94-8a0f-8b692fa67bc6",
+   "metadata": {},
+   "source": [
+    "Imports"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "271eb456-4392-4471-8540-510ed2048a27",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import torch\n",
+    "from datasets import load_dataset, Dataset\n",
+    "\n",
+    "import poptorch\n",
+    "from optimum.graphcore import IPUConfig\n",
+    "from optimum.graphcore.modeling_utils import to_pipelined\n",
+    "\n",
+    "from transformers import BartForConditionalGeneration, BartTokenizerFast\n",
+    "from transformers import DebertaForSequenceClassification, DebertaTokenizerFast\n",
+    "from transformers import DebertaForQuestionAnswering, AutoTokenizer"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "49d94da7-894b-4e98-a52f-eaab56b55b80",
+   "metadata": {},
+   "source": [
+    "## Multi-Genre Natural Language Inference (MNLI)\n",
+    "\n",
+    "MNLI is a sentence-pair classification task, where the goal is to predict whether a given hypothesis is true (entailment) or false (contradiction) given a premise. The task has been proposed as a benchmark for evaluating natural language understanding models. \n",
+    "\n",
+    "In this notebook, we use the Microsoft DeBERTa-base model to classify pairs of sentences on the MNLI task. We first load the model and the tokenizer, then prepare an example input. Finally, we execute the model on an IPU device using PopTorch and obtain the predicted probabilities for the entailment classes.\n"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "37e87ec8",
+   "metadata": {},
+   "source": [
+    "First, load the model and tokeniser from the Huggingface Model Hub"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0d336657",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/deberta-base-mnli\")\n",
+    "model = DebertaForSequenceClassification.from_pretrained(\"microsoft/deberta-base-mnli\")\n",
+    "model.half()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "d0b186f7",
+   "metadata": {},
+   "source": [
+    "Create some example inputs, and encoder those using the tokeniser"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b27fdc19",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "premise = \"A man inspects the uniform of a figure in some East Asian country.\"\n",
+    "hypothesis = \"The man is in an East Asian country.\"\n",
+    "\n",
+    "inputs = tokenizer.encode(\n",
+    "    premise, hypothesis, return_tensors=\"pt\", truncation_strategy=\"only_first\"\n",
+    ")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "ce5dbbd4",
+   "metadata": {},
+   "source": [
+    "Configure the instantiated model to run on IPUs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "12b347ca",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ipu_config = IPUConfig(ipus_per_replica=2, matmul_proportion=0.6, executable_cache_dir=executable_cache_dir)\n",
+    "pipelined_model = to_pipelined(model, ipu_config).eval().parallelize()\n",
+    "pipelined_model = poptorch.inferenceModel(pipelined_model, options=ipu_config.to_options(for_inference=True))"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "514245c3",
+   "metadata": {},
+   "source": [
+    "Run the MNLI model and print the probability of entailment. We calculate this by throwing away neutral (index 1) and running softmax over the remaining logits."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dd77176c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "logits = pipelined_model(inputs)[0]\n",
+    "entail_contradiction_logits = logits[:, [0, 2]]\n",
+    "prob_label_is_true = entail_contradiction_logits.softmax(dim=1)[:, 1]\n",
+    "print(prob_label_is_true)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "397a067b-6e64-4c70-8ee3-3ed9a1b974a3",
+   "metadata": {},
+   "source": [
+    "## Stanford Question Answering Dataset (SQuAD)\n",
+    "\n",
+    "SQuAD is a question answering dataset consisting of questions posed on a set of Wikipedia articles. The goal is to answer the questions by highlighting the span of text in the corresponding article. \n",
+    "\n",
+    "In this notebook, we use the Microsoft DeBERTa-base model to answer a question given a context paragraph. We first load the model and the tokenizer, then prepare an example input. Finally, we execute the model on an IPU device using PopTorch and obtain the predicted answer span.\n"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "3899807a",
+   "metadata": {},
+   "source": [
+    "First, load the model and tokeniser from the Huggingface Model Hub"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c0c22942",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tokenizer = AutoTokenizer.from_pretrained(\"Palak/microsoft_deberta-base_squad\")\n",
+    "model = DebertaForQuestionAnswering.from_pretrained(\n",
+    "    \"Palak/microsoft_deberta-base_squad\"\n",
+    ")\n",
+    "model.half()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "80555e44",
+   "metadata": {},
+   "source": [
+    "Create some example inputs, and tokenise them:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e383bbe9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "question = \"What's my name?\"\n",
+    "context = \"My name is Clara Smith and I live in Berkeley.\"\n",
+    "\n",
+    "inputs = tokenizer.encode(\n",
+    "    question, context, return_tensors=\"pt\", truncation_strategy=\"only_first\"\n",
+    ")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "d5474a4b",
+   "metadata": {},
+   "source": [
+    "Configure the instantiated model to run on IPUs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1bccf13c-2bdb-4d0a-b4ad-9b9dc700e4bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ipu_config = IPUConfig(ipus_per_replica=2, matmul_proportion=0.2, executable_cache_dir=executable_cache_dir)\n",
+    "pipelined_model = to_pipelined(model, ipu_config).parallelize()\n",
+    "pipelined_model = poptorch.inferenceModel(pipelined_model, options=ipu_config.to_options(for_inference=True))"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "4e125178",
+   "metadata": {},
+   "source": [
+    "Run the SQuAD model and print the identified span"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8157508f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "outputs = pipelined_model(inputs)\n",
+    "pred_tokens = inputs[0, outputs.start_logits.argmax() : outputs.end_logits.argmax() + 1]\n",
+    "print(tokenizer.decode(pred_tokens))"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "fb5feafd",
+   "metadata": {},
+   "source": [
+    "## Conclusion\n",
+    "\n",
+    "DeBERTa is a powerful natural language understanding model that makes efficient use of compute resources and parameters. It is demonstrated in this notebook on textual entailment and question answering tasks - which are valuable building blocks for larger NLP systems."
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}