Skip to content

Commit

Permalink
chore: bump version to 3.5.0
Browse files Browse the repository at this point in the history
  • Loading branch information
senysenyseny16 authored Dec 4, 2023
1 parent 59630af commit 5f4ffbb
Show file tree
Hide file tree
Showing 30 changed files with 636 additions and 7,711 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Automatic quantization for enot-lite\n",
"## Automatic quantization\n",
"\n",
"This notebook demonstrates simple end2end pipeline for MobileNetV2 quantization.\n",
"This notebook demonstrates simple end-two-end pipeline for MobileNetV2 quantization.\n",
"\n",
"Our quantization process consists of quantized model calibration, quantization threshold adjustment and weight fine-tuning using distillation. Finally, we demonstrate inference of our quantized model using [enot-lite](https://enot-lite.rtd.enot.ai/en/latest/) framework.\n",
"Our quantization process consists of quantized model calibration, quantization threshold adjustment and weight fine-tuning using distillation. Finally, we demonstrate inference of our quantized model using ONNX Runtime framework.\n",
"\n",
"### Main chapters of this notebook:\n",
"1. Setup the environment\n",
"1. Prepare dataset and create dataloaders\n",
"1. Evaluate pretrained MobileNetV2 from torchvision\n",
"1. End2end quantization with our framework\n",
"1. Inference using enot-lite with TensorRT int8 backend\n",
"1. Inference using ONNX Runtime with TensorRT Execution Provider\n",
"\n",
"Before running this example make sure that TensorRT supports your GPU for int8 inference (``cuda compute capability`` > 6.1, as described [here](https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix))."
]
Expand All @@ -29,6 +29,15 @@
"First, let's set up the environment and make some common imports."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install -r requirements.txt"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -66,9 +75,9 @@
"from enot.quantization import calibrate\n",
"from enot.quantization import distill\n",
"\n",
"# TensorRT inference:\n",
"from enot_lite.backend import BackendFactory\n",
"from enot_lite.type import BackendType"
"# ONNX Runtime inference:\n",
"from tutorial_utils.inference import create_onnxruntime_session\n",
"import onnxsim"
]
},
{
Expand Down Expand Up @@ -288,14 +297,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Inference using enot-lite with TensorRT int8 backend"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For **enot-lite**, we should export our quantized model to onnx:"
"## Inference using ONNX Runtime with TensorRT Execution Provider"
]
},
{
Expand All @@ -304,44 +306,23 @@
"metadata": {},
"outputs": [],
"source": [
"fake_quantized_model.cpu()\n",
"torch.onnx.export(\n",
" model=fake_quantized_model,\n",
" model=fake_quantized_model.cpu(),\n",
" args=torch.zeros(25, 3, 224, 224),\n",
" f='exported_model.onnx',\n",
" opset_version=13,\n",
" input_names=['input'],\n",
" output_names=['output'],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Initialize **enot-lite** inference session with TensorRT Int8 Execution Provider:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"torch.cuda.empty_cache() # Empty PyTorch CUDA cache before running enot-lite.\n",
")\n",
"\n",
"sess = BackendFactory().create('exported_model.onnx', BackendType.ORT_TENSORRT)"
"proto, _ = onnxsim.simplify('exported_model.onnx')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First TensorRT run is usually slow because it chooses the best algorithms for inference.\n",
"\n",
"Let's run session once before validation:"
"Initialize ONNX Runtime inference session with TensorRT Execution Provider:"
]
},
{
Expand All @@ -352,14 +333,20 @@
},
"outputs": [],
"source": [
"sess(torch.zeros((25, 3, 224, 224), dtype=torch.float32, device='cuda'));"
"torch.cuda.empty_cache() # Empty PyTorch CUDA cache before running ONNX Runtime.\n",
"\n",
"sess = create_onnxruntime_session(\n",
" proto=proto,\n",
" input_sample=torch.zeros(25, 3, 224, 224, device='cuda'),\n",
" output_shape=(25, 1000),\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Evaluate quantized model on TensorRT:"
"Evaluate quantized model on TensorRT Execution Provider:"
]
},
{
Expand All @@ -369,8 +356,7 @@
"outputs": [],
"source": [
"def model_fn(inputs):\n",
" trt_output = sess.run(inputs)[0]\n",
" return trt_output\n",
" return sess(inputs)\n",
"\n",
"\n",
"val_loss, val_accuracy = eval_model(model_fn, validation_dataloader)\n",
Expand Down
11 changes: 10 additions & 1 deletion 2. Tutorial - pruning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,15 @@
"First, let's set up the environment and make some common imports."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install -r requirements.txt"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -62,7 +71,7 @@
"# Training:\n",
"from torch.optim.lr_scheduler import CosineAnnealingLR\n",
"from torch.optim import RAdam\n",
"from tutorial_utils.phases import tutorial_train_loop\n",
"from tutorial_utils.train import tutorial_train_loop\n",
"from tutorial_utils.train import WarmupScheduler\n",
"\n",
"# Pruning:\n",
Expand Down
119 changes: 82 additions & 37 deletions 3. Tutorial - Ultralytics YOLO-v5 quantization.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,21 @@
"id": "1fda310e",
"metadata": {},
"source": [
"## Automatic quantization and optimized inference for YOLO-v5 with enot-lite backend\n",
"## Automatic quantization and optimized inference for YOLOv5 with ONNX Runtime (TensorrRT Execution Provider)\n",
"\n",
"This notebook demonstrates simple procedure for Ultralytics Yolo-v5 quantization.\n",
"This notebook demonstrates simple procedure for Ultralytics YOLOv5 quantization.\n",
"\n",
"Our quantization process consists of quantized model calibration, quantization thresholds adjustment and weight fine-tuning using distillation. Finally, we demonstrate inference of our quantized model using [enot-lite](https://enot-lite.rtd.enot.ai/en/stable/) framework.\n",
"Our quantization process consists of quantized model calibration, quantization thresholds adjustment and weight fine-tuning using distillation. Finally, we demonstrate inference of our quantized model using ONNX Runtime framework.\n",
"\n",
"### Main chapters of this notebook:\n",
"1. Setup the environment\n",
"1. Prepare dataset and create dataloaders\n",
"1. Baseline Yolo-v5 onnx creation\n",
"1. Quantize Yolo-v5\n",
"1. Measure speed of default YOLO inferenced via default pytorch and quantized YOLO inferenced via enot-lite with TensorRT int8 backend.\n",
"1. Baseline YOLOv5 ONNX creation\n",
"1. Quantize YOLOv5\n",
"1. Measure speed of default YOLOv5 inferenced via default PyTorch and quantized YOLOv5 inferenced via ONNX Runtime (TensorRT)\n",
"1. Measure mAP for float and quantized versions\n",
"\n",
"Before running this example make sure that TensorRT supports your GPU for int8 inference (``cuda compute capability`` > 6.1, as described [here](https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix))."
"Before running this example make sure that TensorRT supports your GPU for INT8 inference (``cuda compute capability`` > 6.1, as described [here](https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix))."
]
},
{
Expand All @@ -39,7 +39,8 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install pyyaml"
"!pip install -r requirements.txt\n",
"!pip install 'numpy<1.24'"
]
},
{
Expand All @@ -58,7 +59,7 @@
"id": "bae062b0",
"metadata": {},
"source": [
"1. Install enot-autodl and enot-lite libraries and create jupyter kernel with them.\n",
"1. Install enot-autodl and ONNX Runtime libraries and create jupyter kernel with them.\n",
"2. Clone specific commit from YOLOv5 repository: https://github.com/ultralytics/yolov5/commit/f76a78e7078185ecdc67470d8658103cf2067c81\n",
"3. Replace the val.py script with our val.py\n",
"4. Replace path to COCO dataset folder in 'yolov5/data/coco.yaml' file. If you do not have pre-downloaded MS COCO dataset - you can leave it as is and the dataset will be automatically downloaded.\n",
Expand Down Expand Up @@ -90,7 +91,9 @@
"sys.path.append('yolov5/')\n",
"\n",
"import itertools\n",
"import statistics\n",
"import numpy as np\n",
"from timeit import Timer\n",
"\n",
"import torch\n",
"from torch.optim.lr_scheduler import CosineAnnealingLR\n",
Expand All @@ -107,9 +110,8 @@
"from enot.quantization import RMSELoss\n",
"\n",
"# optimized inference\n",
"from enot_lite.benchmark import Benchmark\n",
"from enot_lite.type import BackendType\n",
"from enot_lite.type import ModelType\n",
"from tutorial_utils.inference import create_onnxruntime_session\n",
"import onnxsim\n",
"\n",
"# converters from onnx to pytorch\n",
"from onnx2torch import convert\n",
Expand Down Expand Up @@ -147,7 +149,7 @@
"source": [
"HOME_DIR = Path.home() / '.optimization_experiments'\n",
"DATASETS_DIR = HOME_DIR / 'datasets/coco_for_yolo'\n",
"PROJECT_DIR = HOME_DIR / 'enot-lite_quantization'\n",
"PROJECT_DIR = HOME_DIR / 'yolov5s_quantization'\n",
"QUANT_ONNX_PATH = './yolov5s_trt_int8.onnx'\n",
"ONNX_PATH = './yolov5s.onnx'\n",
"\n",
Expand Down Expand Up @@ -360,10 +362,9 @@
" model=fake_quantized_model,\n",
" args=torch.ones(*IMG_SHAPE),\n",
" f=QUANT_ONNX_PATH,\n",
" input_names=['images'],\n",
" input_names=['input'],\n",
" output_names=['output'],\n",
" opset_version=13,\n",
" dynamic_axes={'images': {0: 'batch_size'}},\n",
")"
]
},
Expand Down Expand Up @@ -392,32 +393,79 @@
"metadata": {},
"outputs": [],
"source": [
"yolov5s = attempt_load('yolov5s.pt')"
"yolov5s = attempt_load('yolov5s.pt').cuda()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f19d9e0d",
"id": "c4b0cc45-236a-4784-aa96-7473a69793d1",
"metadata": {},
"outputs": [],
"source": [
"def measure_fps(infer):\n",
" for _ in range(50): # warmup\n",
" infer()\n",
"\n",
" number = 50\n",
" measurements = Timer(infer).repeat(repeat=50, number=number)\n",
" norm = statistics.mean(measurements) / number / BATCH_SIZE\n",
" fps = 1.0 / norm\n",
" return fps"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2b1dee9c-ffad-4f77-ae21-966ee98f6a23",
"metadata": {},
"outputs": [],
"source": [
"inputs = torch.ones((BATCH_SIZE, 3, IMG_SIZE, IMG_SIZE), dtype=torch.float32, device='cuda')\n",
"\n",
"\n",
"def infer_torch():\n",
" yolov5s(inputs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "860f55c9-d347-479b-979d-e5fbb78c56f9",
"metadata": {},
"outputs": [],
"source": [
"torch_input = torch.ones((BATCH_SIZE, 3, IMG_SIZE, IMG_SIZE), dtype=torch.float32).cpu()\n",
"onnx_input = {'images': np.ones((BATCH_SIZE, 3, IMG_SIZE, IMG_SIZE), dtype=np.float32)}\n",
"\n",
"benchmark = Benchmark(\n",
" batch_size=BATCH_SIZE,\n",
" torch_model=yolov5s,\n",
" torch_input=torch_input,\n",
" backends=[\n",
" BackendType.TORCH_CUDA,\n",
" (BackendType.AUTO_GPU, ModelType.YOLO_V5),\n",
" ],\n",
" onnx_model=QUANT_ONNX_PATH,\n",
" onnx_input=onnx_input,\n",
"proto, _ = onnxsim.simplify(QUANT_ONNX_PATH)\n",
"\n",
"onnxruntime_sess = create_onnxruntime_session(\n",
" proto=proto,\n",
" input_sample=inputs,\n",
" output_shape=(BATCH_SIZE, 25200, 85),\n",
")\n",
"benchmark.run()\n",
"benchmark.print_results()"
"\n",
"\n",
"def infer_onnxruntime():\n",
" onnxruntime_sess(inputs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f19d9e0d",
"metadata": {},
"outputs": [],
"source": [
"measure_fps(infer_torch) # PyTorch FPS"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "37b83fe4-956a-47ca-9a21-47505f2140b1",
"metadata": {},
"outputs": [],
"source": [
"measure_fps(infer_onnxruntime) # ONNX Runtime (TensorRT Execution Provider) FPS"
]
},
{
Expand Down Expand Up @@ -493,11 +541,8 @@
"metadata": {},
"outputs": [],
"source": [
"opt['use_enot_lite'] = True\n",
"opt['enot_lite_weights'] = QUANT_ONNX_PATH\n",
"opt['half'] = False\n",
"opt['device'] = 'cpu'\n",
"opt['batch_size'] = 8"
"opt['onnxruntime_sess'] = onnxruntime_sess\n",
"opt['half'] = False"
]
},
{
Expand Down
Loading

0 comments on commit 5f4ffbb

Please sign in to comment.