chore: bump version to 3.5.0

ENOT-AutoDL · Dec 4, 2023 · 5f4ffbb · 5f4ffbb
1 parent 59630af
commit 5f4ffbb
Show file tree

Hide file tree

Showing 30 changed files with 636 additions and 7,711 deletions.
diff --git a/...utomatic quantization for enot-lite.ipynb → 1. Tutorial - automatic quantization.ipynb b/...utomatic quantization for enot-lite.ipynb → 1. Tutorial - automatic quantization.ipynb
@@ -4,18 +4,18 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Automatic quantization for enot-lite\n",
+    "## Automatic quantization\n",
     "\n",
-    "This notebook demonstrates simple end2end pipeline for MobileNetV2 quantization.\n",
+    "This notebook demonstrates simple end-two-end pipeline for MobileNetV2 quantization.\n",
     "\n",
-    "Our quantization process consists of quantized model calibration, quantization threshold adjustment and weight fine-tuning using distillation. Finally, we demonstrate inference of our quantized model using [enot-lite](https://enot-lite.rtd.enot.ai/en/latest/) framework.\n",
+    "Our quantization process consists of quantized model calibration, quantization threshold adjustment and weight fine-tuning using distillation. Finally, we demonstrate inference of our quantized model using ONNX Runtime framework.\n",
     "\n",
     "### Main chapters of this notebook:\n",
     "1. Setup the environment\n",
     "1. Prepare dataset and create dataloaders\n",
     "1. Evaluate pretrained MobileNetV2 from torchvision\n",
     "1. End2end quantization with our framework\n",
-    "1. Inference using enot-lite with TensorRT int8 backend\n",
+    "1. Inference using ONNX Runtime with TensorRT Execution Provider\n",
     "\n",
     "Before running this example make sure that TensorRT supports your GPU for int8 inference  (``cuda compute capability`` > 6.1, as described [here](https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix))."
    ]
@@ -29,6 +29,15 @@
     "First, let's set up the environment and make some common imports."
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install -r requirements.txt"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -66,9 +75,9 @@
     "from enot.quantization import calibrate\n",
     "from enot.quantization import distill\n",
     "\n",
-    "# TensorRT inference:\n",
-    "from enot_lite.backend import BackendFactory\n",
-    "from enot_lite.type import BackendType"
+    "# ONNX Runtime inference:\n",
+    "from tutorial_utils.inference import create_onnxruntime_session\n",
+    "import onnxsim"
    ]
   },
   {
@@ -288,14 +297,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Inference using enot-lite with TensorRT int8 backend"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "For **enot-lite**, we should export our quantized model to onnx:"
+    "## Inference using ONNX Runtime with TensorRT Execution Provider"
    ]
   },
   {
@@ -304,44 +306,23 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "fake_quantized_model.cpu()\n",
     "torch.onnx.export(\n",
-    "    model=fake_quantized_model,\n",
+    "    model=fake_quantized_model.cpu(),\n",
     "    args=torch.zeros(25, 3, 224, 224),\n",
     "    f='exported_model.onnx',\n",
     "    opset_version=13,\n",
     "    input_names=['input'],\n",
     "    output_names=['output'],\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Initialize **enot-lite** inference session with TensorRT Int8 Execution Provider:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "torch.cuda.empty_cache()  # Empty PyTorch CUDA cache before running enot-lite.\n",
+    ")\n",
     "\n",
-    "sess = BackendFactory().create('exported_model.onnx', BackendType.ORT_TENSORRT)"
+    "proto, _ = onnxsim.simplify('exported_model.onnx')"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "First TensorRT run is usually slow because it chooses the best algorithms for inference.\n",
-    "\n",
-    "Let's run session once before validation:"
+    "Initialize ONNX Runtime inference session with TensorRT Execution Provider:"
    ]
   },
   {
@@ -352,14 +333,20 @@
    },
    "outputs": [],
    "source": [
-    "sess(torch.zeros((25, 3, 224, 224), dtype=torch.float32, device='cuda'));"
+    "torch.cuda.empty_cache()  # Empty PyTorch CUDA cache before running ONNX Runtime.\n",
+    "\n",
+    "sess = create_onnxruntime_session(\n",
+    "    proto=proto,\n",
+    "    input_sample=torch.zeros(25, 3, 224, 224, device='cuda'),\n",
+    "    output_shape=(25, 1000),\n",
+    ")"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Evaluate quantized model on TensorRT:"
+    "Evaluate quantized model on TensorRT Execution Provider:"
    ]
   },
   {
@@ -369,8 +356,7 @@
    "outputs": [],
    "source": [
     "def model_fn(inputs):\n",
-    "    trt_output = sess.run(inputs)[0]\n",
-    "    return trt_output\n",
+    "    return sess(inputs)\n",
     "\n",
     "\n",
     "val_loss, val_accuracy = eval_model(model_fn, validation_dataloader)\n",

diff --git a/2. Tutorial - pruning.ipynb b/2. Tutorial - pruning.ipynb
@@ -27,6 +27,15 @@
     "First, let's set up the environment and make some common imports."
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install -r requirements.txt"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -62,7 +71,7 @@
     "# Training:\n",
     "from torch.optim.lr_scheduler import CosineAnnealingLR\n",
     "from torch.optim import RAdam\n",
-    "from tutorial_utils.phases import tutorial_train_loop\n",
+    "from tutorial_utils.train import tutorial_train_loop\n",
     "from tutorial_utils.train import WarmupScheduler\n",
     "\n",
     "# Pruning:\n",

diff --git a/3. Tutorial - Ultralytics YOLO-v5 quantization.ipynb b/3. Tutorial - Ultralytics YOLO-v5 quantization.ipynb
@@ -5,21 +5,21 @@
    "id": "1fda310e",
    "metadata": {},
    "source": [
-    "## Automatic quantization and optimized inference for YOLO-v5 with enot-lite backend\n",
+    "## Automatic quantization and optimized inference for YOLOv5 with ONNX Runtime (TensorrRT Execution Provider)\n",
     "\n",
-    "This notebook demonstrates simple procedure for Ultralytics Yolo-v5 quantization.\n",
+    "This notebook demonstrates simple procedure for Ultralytics YOLOv5 quantization.\n",
     "\n",
-    "Our quantization process consists of quantized model calibration, quantization thresholds adjustment and weight fine-tuning using distillation. Finally, we demonstrate inference of our quantized model using [enot-lite](https://enot-lite.rtd.enot.ai/en/stable/) framework.\n",
+    "Our quantization process consists of quantized model calibration, quantization thresholds adjustment and weight fine-tuning using distillation. Finally, we demonstrate inference of our quantized model using ONNX Runtime framework.\n",
     "\n",
     "### Main chapters of this notebook:\n",
     "1. Setup the environment\n",
     "1. Prepare dataset and create dataloaders\n",
-    "1. Baseline Yolo-v5 onnx creation\n",
-    "1. Quantize Yolo-v5\n",
-    "1. Measure speed of default YOLO inferenced via default pytorch and quantized YOLO inferenced via enot-lite with TensorRT int8 backend.\n",
+    "1. Baseline YOLOv5 ONNX creation\n",
+    "1. Quantize YOLOv5\n",
+    "1. Measure speed of default YOLOv5 inferenced via default PyTorch and quantized YOLOv5 inferenced via ONNX Runtime (TensorRT)\n",
     "1. Measure mAP for float and quantized versions\n",
     "\n",
-    "Before running this example make sure that TensorRT supports your GPU for int8 inference  (``cuda compute capability`` > 6.1, as described [here](https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix))."
+    "Before running this example make sure that TensorRT supports your GPU for INT8 inference  (``cuda compute capability`` > 6.1, as described [here](https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix))."
    ]
   },
   {
@@ -39,7 +39,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!pip install pyyaml"
+    "!pip install -r requirements.txt\n",
+    "!pip install 'numpy<1.24'"
    ]
   },
   {
@@ -58,7 +59,7 @@
    "id": "bae062b0",
    "metadata": {},
    "source": [
-    "1. Install enot-autodl and enot-lite libraries and create jupyter kernel with them.\n",
+    "1. Install enot-autodl and ONNX Runtime libraries and create jupyter kernel with them.\n",
     "2. Clone specific commit from YOLOv5 repository: https://github.com/ultralytics/yolov5/commit/f76a78e7078185ecdc67470d8658103cf2067c81\n",
     "3. Replace the val.py script with our val.py\n",
     "4. Replace path to COCO dataset folder in 'yolov5/data/coco.yaml' file. If you do not have pre-downloaded MS COCO dataset - you can leave it as is and the dataset will be automatically downloaded.\n",
@@ -90,7 +91,9 @@
     "sys.path.append('yolov5/')\n",
     "\n",
     "import itertools\n",
+    "import statistics\n",
     "import numpy as np\n",
+    "from timeit import Timer\n",
     "\n",
     "import torch\n",
     "from torch.optim.lr_scheduler import CosineAnnealingLR\n",
@@ -107,9 +110,8 @@
     "from enot.quantization import RMSELoss\n",
     "\n",
     "# optimized inference\n",
-    "from enot_lite.benchmark import Benchmark\n",
-    "from enot_lite.type import BackendType\n",
-    "from enot_lite.type import ModelType\n",
+    "from tutorial_utils.inference import create_onnxruntime_session\n",
+    "import onnxsim\n",
     "\n",
     "# converters from onnx to pytorch\n",
     "from onnx2torch import convert\n",
@@ -147,7 +149,7 @@
    "source": [
     "HOME_DIR = Path.home() / '.optimization_experiments'\n",
     "DATASETS_DIR = HOME_DIR / 'datasets/coco_for_yolo'\n",
-    "PROJECT_DIR = HOME_DIR / 'enot-lite_quantization'\n",
+    "PROJECT_DIR = HOME_DIR / 'yolov5s_quantization'\n",
     "QUANT_ONNX_PATH = './yolov5s_trt_int8.onnx'\n",
     "ONNX_PATH = './yolov5s.onnx'\n",
     "\n",
@@ -360,10 +362,9 @@
     "    model=fake_quantized_model,\n",
     "    args=torch.ones(*IMG_SHAPE),\n",
     "    f=QUANT_ONNX_PATH,\n",
-    "    input_names=['images'],\n",
+    "    input_names=['input'],\n",
     "    output_names=['output'],\n",
     "    opset_version=13,\n",
-    "    dynamic_axes={'images': {0: 'batch_size'}},\n",
     ")"
    ]
   },
@@ -392,32 +393,79 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "yolov5s = attempt_load('yolov5s.pt')"
+    "yolov5s = attempt_load('yolov5s.pt').cuda()"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "f19d9e0d",
+   "id": "c4b0cc45-236a-4784-aa96-7473a69793d1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def measure_fps(infer):\n",
+    "    for _ in range(50):  # warmup\n",
+    "        infer()\n",
+    "\n",
+    "    number = 50\n",
+    "    measurements = Timer(infer).repeat(repeat=50, number=number)\n",
+    "    norm = statistics.mean(measurements) / number / BATCH_SIZE\n",
+    "    fps = 1.0 / norm\n",
+    "    return fps"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2b1dee9c-ffad-4f77-ae21-966ee98f6a23",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "inputs = torch.ones((BATCH_SIZE, 3, IMG_SIZE, IMG_SIZE), dtype=torch.float32, device='cuda')\n",
+    "\n",
+    "\n",
+    "def infer_torch():\n",
+    "    yolov5s(inputs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "860f55c9-d347-479b-979d-e5fbb78c56f9",
    "metadata": {},
    "outputs": [],
    "source": [
-    "torch_input = torch.ones((BATCH_SIZE, 3, IMG_SIZE, IMG_SIZE), dtype=torch.float32).cpu()\n",
-    "onnx_input = {'images': np.ones((BATCH_SIZE, 3, IMG_SIZE, IMG_SIZE), dtype=np.float32)}\n",
-    "\n",
-    "benchmark = Benchmark(\n",
-    "    batch_size=BATCH_SIZE,\n",
-    "    torch_model=yolov5s,\n",
-    "    torch_input=torch_input,\n",
-    "    backends=[\n",
-    "        BackendType.TORCH_CUDA,\n",
-    "        (BackendType.AUTO_GPU, ModelType.YOLO_V5),\n",
-    "    ],\n",
-    "    onnx_model=QUANT_ONNX_PATH,\n",
-    "    onnx_input=onnx_input,\n",
+    "proto, _ = onnxsim.simplify(QUANT_ONNX_PATH)\n",
+    "\n",
+    "onnxruntime_sess = create_onnxruntime_session(\n",
+    "    proto=proto,\n",
+    "    input_sample=inputs,\n",
+    "    output_shape=(BATCH_SIZE, 25200, 85),\n",
     ")\n",
-    "benchmark.run()\n",
-    "benchmark.print_results()"
+    "\n",
+    "\n",
+    "def infer_onnxruntime():\n",
+    "    onnxruntime_sess(inputs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f19d9e0d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "measure_fps(infer_torch)  # PyTorch FPS"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "37b83fe4-956a-47ca-9a21-47505f2140b1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "measure_fps(infer_onnxruntime)  # ONNX Runtime (TensorRT Execution Provider) FPS"
    ]
   },
   {
@@ -493,11 +541,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "opt['use_enot_lite'] = True\n",
-    "opt['enot_lite_weights'] = QUANT_ONNX_PATH\n",
-    "opt['half'] = False\n",
-    "opt['device'] = 'cpu'\n",
-    "opt['batch_size'] = 8"
+    "opt['onnxruntime_sess'] = onnxruntime_sess\n",
+    "opt['half'] = False"
    ]
   },
   {