Optimizing TensorFlow models with Neural Network Compression Framework of OpenVINO™ by 8-bit quantization.
This tutorial demonstrates how to use NNCF 8-bit quantization to optimize the TensorFlow model for inference with OpenVINO Toolkit. For more advanced usage, refer to these examples.
To speed up download and training, use a ResNet-18 model with the Imagenette dataset. Imagenette is a subset of 10 easily classified classes from the ImageNet dataset.
This tutorial consists of the following steps:
- Fine-tuning of
FP32
model - Transforming the original
FP32
model toINT8
- Using fine-tuning to restore the accuracy.
- Exporting optimized and original models to Frozen Graph and then to OpenVINO
- Measuring and comparing the performance of the models.
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to Installation Guide.