Skip to content

Commit

Permalink
DOCS Update optimization docs with NNCF PTQ changes and deprecation o…
Browse files Browse the repository at this point in the history
…f POT (openvinotoolkit#17398)

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update home.rst

* Update ptq_introduction.md

* Update Introduction.md

* Update Introduction.md

* Update Introduction.md

* Update ptq_introduction.md

* Update ptq_introduction.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update model_optimization_guide.md

* Update ptq_introduction.md

* Update quantization_w_accuracy_control.md

* Update model_optimization_guide.md

* Update quantization_w_accuracy_control.md

* Update model_optimization_guide.md

* Update quantization_w_accuracy_control.md

* Update model_optimization_guide.md

* Update Introduction.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update ptq_introduction.md

* Update Introduction.md

* Update model_optimization_guide.md

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update Introduction.md

* Update FrequentlyAskedQuestions.md

* Update model_optimization_guide.md

* Update Introduction.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update ptq_introduction.md

* Update ptq_introduction.md

* added code snippet (#1)

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update ptq_introduction.md

* Update model_optimization_guide.md

* Update basic_quantization_flow.md

* Update ptq_introduction.md

* Update quantization_w_accuracy_control.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update ptq_introduction.md

* Update ptq_introduction.md

* Delete ptq_introduction.md

* Update FrequentlyAskedQuestions.md

* Update Introduction.md

* Update quantization_w_accuracy_control.md

* Update introduction.md

* Update basic_quantization_flow.md code blocks

* Update quantization_w_accuracy_control.md code snippets

* Update docs/optimization_guide/nncf/ptq/code/ptq_torch.py

Co-authored-by: Alexander Suslov <[email protected]>

* Update model_optimization_guide.md

* Optimization docs proofreading  (#2)

* images updated

* delete reminder

* review

* text review

* change images to original ones

* Update filter_pruning.md code blocks

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update images (openvinotoolkit#3)

* images updated

* delete reminder

* review

* text review

* change images to original ones

* Update filter_pruning.md code blocks

* update images

* resolve conflicts

* resolve conflicts

* change images to original ones

* resolve conflicts

* update images

* fix conflicts

* Update model_optimization_guide.md

* Update docs/optimization_guide/nncf/ptq/code/ptq_tensorflow.py

Co-authored-by: Alexander Suslov <[email protected]>

* Update docs/optimization_guide/nncf/ptq/code/ptq_torch.py

Co-authored-by: Alexander Suslov <[email protected]>

* Update docs/optimization_guide/nncf/ptq/code/ptq_onnx.py

Co-authored-by: Alexander Suslov <[email protected]>

* Update docs/optimization_guide/nncf/ptq/code/ptq_aa_openvino.py

Co-authored-by: Alexander Suslov <[email protected]>

* Update docs/optimization_guide/nncf/ptq/code/ptq_openvino.py

Co-authored-by: Alexander Suslov <[email protected]>

* table format fix

* Update headers

* Update qat.md code blocks

---------

Co-authored-by: Alexander Suslov <[email protected]>
Co-authored-by: Tatiana Savina <[email protected]>
  • Loading branch information
3 people authored May 19, 2023
1 parent 8b215ca commit 2513db5
Show file tree
Hide file tree
Showing 36 changed files with 664 additions and 467 deletions.
4 changes: 2 additions & 2 deletions docs/_static/images/DEVELOPMENT_FLOW_V3_crunch.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/_static/images/WHAT_TO_USE.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/_static/images/workflow_simple.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/home.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ You can integrate and offload to accelerators additional operations for pre- and
Model Quantization and Compression
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Boost your model’s speed even further with quantization and other state-of-the-art compression techniques available in OpenVINO’s Post-Training Optimization Tool and Neural Network Compression Framework. These techniques also reduce your model size and memory requirements, allowing it to be deployed on resource-constrained edge hardware.
Boost your model’s speed even further with quantization and other state-of-the-art compression techniques available in OpenVINO’s Neural Network Compression Framework. These techniques also reduce your model size and memory requirements, allowing it to be deployed on resource-constrained edge hardware.

.. panels::
:card: homepage-panels
Expand Down
28 changes: 9 additions & 19 deletions docs/optimization_guide/model_optimization_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,40 +8,30 @@

ptq_introduction
tmo_introduction
(Experimental) Protecting Model <pot_ranger_README>


Model optimization is an optional offline step of improving final model performance by applying special optimization methods, such as quantization, pruning, preprocessing optimization, etc. OpenVINO provides several tools to optimize models at different steps of model development:
Model optimization is an optional offline step of improving the final model performance and reducing the model size by applying special optimization methods, such as 8-bit quantization, pruning, etc. OpenVINO offers two optimization paths implemented in `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf>`__:

- :doc:`Model Optimizer <openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide>` implements most of the optimization parameters to a model by default. Yet, you are free to configure mean/scale values, batch size, RGB vs BGR input channels, and other parameters to speed up preprocess of a model (:doc:`Embedding Preprocessing Computation <openvino_docs_MO_DG_Additional_Optimization_Use_Cases>`).
- :doc:`Post-training Quantization <ptq_introduction>` is designed to optimize the inference of deep learning models by applying the post-training 8-bit integer quantization that does not require model retraining or fine-tuning.

- :doc:`Post-training Quantization <pot_introduction>` is designed to optimize inference of deep learning models by applying post-training methods that do not require model retraining or fine-tuning, for example, post-training 8-bit integer quantization.
- :doc:`Training-time Optimization <tmo_introduction>`, a suite of advanced methods for training-time model optimization within the DL framework, such as PyTorch and TensorFlow 2.x. It supports methods like Quantization-aware Training, Structured and Unstructured Pruning, etc.

- :doc:`Training-time Optimization <nncf_ptq_introduction>`, a suite of advanced methods for training-time model optimization within the DL framework, such as PyTorch and TensorFlow 2.x. It supports methods, like Quantization-aware Training and Filter Pruning. NNCF-optimized models can be inferred with OpenVINO using all the available workflows.
.. note:: OpenVINO also supports optimized models (for example, quantized) from source frameworks such as PyTorch, TensorFlow, and ONNX (in Q/DQ format). No special steps are required in this case and optimized models can be converted to the OpenVINO Intermediate Representation format (IR) right away.

Post-training Quantization is the fastest way to optimize a model and should be applied first, but it is limited in terms of achievable accuracy-performance trade-off. In case of poor accuracy or performance after Post-training Quantization, Training-time Optimization can be used as an option.

Detailed workflow:
##################

To understand which development optimization tool you need, refer to the diagram:
Once the model is optimized using the aforementioned methods, it can be used for inference using the regular OpenVINO inference workflow. No changes to the inference code are required.

.. image:: _static/images/DEVELOPMENT_FLOW_V3_crunch.svg

Post-training methods are limited in terms of achievable accuracy-performance trade-off for optimizing models. In this case, training-time optimization with NNCF is an option.

Once the model is optimized using the aforementioned tools it can be used for inference using the regular OpenVINO inference workflow. No changes to the inference code are required.

.. image:: _static/images/WHAT_TO_USE.svg

Post-training methods are limited in terms of achievable accuracy, which may degrade for certain scenarios. In such cases, training-time optimization with NNCF may give better results.

Once the model has been optimized using the aforementioned tools, it can be used for inference using the regular OpenVINO inference workflow. No changes to the code are required.

If you are not familiar with model optimization methods, refer to :doc:`post-training methods <pot_introduction>`.

Additional Resources
####################

- :doc:`Post-training Quantization <ptq_introduction>`
- :doc:`Training-time Optimization <tmo_introduction>`
- :doc:`Deployment optimization <openvino_docs_deployment_optimization_guide_dldt_optimization_guide>`
- `HuggingFace Optimum Intel <https://huggingface.co/docs/optimum/intel/optimization_ov>`__

@endsphinxdirective
220 changes: 122 additions & 98 deletions docs/optimization_guide/nncf/filter_pruning.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,15 @@
Introduction
####################

Filter pruning is an advanced optimization method which allows reducing computational complexity of the model by removing
redundant or unimportant filters from convolutional operations of the model. This removal is done in two steps:
Filter pruning is an advanced optimization method that allows reducing the computational complexity of the model by removing
redundant or unimportant filters from the convolutional operations of the model. This removal is done in two steps:

1. Unimportant filters are zeroed out by the NNCF optimization with fine-tuning.

2. Zero filters are removed from the model during the export to OpenVINO Intermediate Representation (IR).


Filter Pruning method from the NNCF can be used stand-alone but we usually recommend to stack it with 8-bit quantization for
Filter Pruning method from the NNCF can be used stand-alone but we usually recommend stacking it with 8-bit quantization for
two reasons. First, 8-bit quantization is the best method in terms of achieving the highest accuracy-performance trade-offs so
stacking it with filter pruning can give even better performance results. Second, applying quantization along with filter
pruning does not hurt accuracy a lot since filter pruning removes noisy filters from the model which narrows down values
Expand All @@ -37,44 +37,52 @@ Here, we show the basic steps to modify the training script for the model and us

In this step, NNCF-related imports are added in the beginning of the training script:

.. tab:: PyTorch
.. tab-set::

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [imports]
.. tab-item:: PyTorch
:sync: pytorch

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [imports]

.. tab-item:: TensorFlow 2
:sync: tensorflow

.. tab:: TensorFlow 2

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [imports]
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [imports]

2. Create NNCF configuration
++++++++++++++++++++++++++++

Here, you should define NNCF configuration which consists of model-related parameters (`"input_info"` section) and parameters
of optimization methods (`"compression"` section).

.. tab:: PyTorch

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [nncf_congig]

.. tab:: TensorFlow 2

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [nncf_congig]

Here is a brief description of the required parameters of the Filter Pruning method. For full description refer to the
.. tab-set::

.. tab-item:: PyTorch
:sync: pytorch

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [nncf_congig]

.. tab-item:: TensorFlow 2
:sync: tensorflow

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [nncf_congig]
Here is a brief description of the required parameters of the Filter Pruning method. For a full description refer to the
`GitHub <https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Pruning.md>`__ page.

* ``pruning_init`` - initial pruning rate target. For example, value ``0.1`` means that at the begging of training, convolutions that can be pruned will have 10% of their filters set to zero.

* ``pruning_target`` - pruning rate target at the end of the schedule. For example, the value ``0.5`` means that at the epoch with the number of ``num_init_steps + pruning_steps``, convolutions that can be pruned will have 50% of their filters set to zero.

* ``pruning_steps` - the number of epochs during which the pruning rate target is increased from ``pruning_init` to ``pruning_target`` value. We recommend to keep the highest learning rate during this period.
* ``pruning_steps` - the number of epochs during which the pruning rate target is increased from ``pruning_init` to ``pruning_target`` value. We recommend keeping the highest learning rate during this period.


3. Apply optimization methods
Expand All @@ -86,39 +94,44 @@ that can be used the same way as the original model. It is worth noting that opt
so that the model undergoes a set of corresponding transformations and can contain additional operations required for the
optimization.

.. tab-set::

.. tab:: PyTorch

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [wrap_model]

.. tab:: TensorFlow 2

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [wrap_model]
.. tab-item:: PyTorch
:sync: pytorch

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [wrap_model]

.. tab-item:: TensorFlow 2
:sync: tensorflow

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [wrap_model]

4. Fine-tune the model
++++++++++++++++++++++

This step assumes that you will apply fine-tuning to the model the same way as it is done for the baseline model. In the case
of Filter Pruning method we recommend using the training schedule and learning rate similar to what was used for the training
of original model.

of the original model.

.. tab:: PyTorch
.. tab-set::

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [tune_model]
.. tab-item:: PyTorch
:sync: pytorch

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [tune_model]

.. tab-item:: TensorFlow 2
:sync: tensorflow

.. tab:: TensorFlow 2

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [tune_model]
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [tune_model]


5. Multi-GPU distributed training
Expand All @@ -127,38 +140,43 @@ of original model.
In the case of distributed multi-GPU training (not DataParallel), you should call ``compression_ctrl.distributed()`` before the
fine-tuning that will inform optimization methods to do some adjustments to function in the distributed mode.


.. tab:: PyTorch

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [distributed]

.. tab:: TensorFlow 2

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [distributed]


.. tab-set::

.. tab-item:: PyTorch
:sync: pytorch

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [distributed]

.. tab-item:: TensorFlow 2
:sync: tensorflow

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [distributed]
6. Export quantized model
+++++++++++++++++++++++++

When fine-tuning finishes, the quantized model can be exported to the corresponding format for further inference: ONNX in
the case of PyTorch and frozen graph - for TensorFlow 2.

.. tab-set::

.. tab:: PyTorch

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [export]

.. tab:: TensorFlow 2
.. tab-item:: PyTorch
:sync: pytorch

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [export]

.. tab-item:: TensorFlow 2
:sync: tensorflow

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [export]
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [export]


These were the basic steps to applying the QAT method from the NNCF. However, it is required in some cases to save/load model
Expand All @@ -170,57 +188,63 @@ checkpoints during the training. Since NNCF wraps the original model with its ow

To save model checkpoint use the following API:

.. tab-set::

.. tab:: PyTorch

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [save_checkpoint]

.. tab:: TensorFlow 2

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [save_checkpoint]
.. tab-item:: PyTorch
:sync: pytorch

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [save_checkpoint]

.. tab-item:: TensorFlow 2
:sync: tensorflow

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [save_checkpoint]

8. (Optional) Restore from checkpoint
+++++++++++++++++++++++++++++++++++++

To restore the model from checkpoint you should use the following API:

.. tab:: PyTorch

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [load_checkpoint]

.. tab:: TensorFlow 2
.. tab-set::

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [load_checkpoint]
.. tab-item:: PyTorch
:sync: pytorch

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
:language: python
:fragment: [load_checkpoint]

.. tab-item:: TensorFlow 2
:sync: tensorflow

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [load_checkpoint]

For more details on saving/loading checkpoints in the NNCF, see the following
`documentation <https://github.com/openvinotoolkit/nncf/blob/develop/docs/Usage.md#saving-and-loading-compressed-models>`__.

Deploying pruned model
######################

The pruned model requres an extra step that should be done to get performance improvement. This step involves removal of the
zero filters from the model. This is done at the model conversion step using :doc:`Model Optimizer <openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide>` tool when model is converted from the framework representation (ONNX, TensorFlow, etc.) to OpenVINO Intermediate Representation.
The pruned model requires an extra step that should be done to get a performance improvement. This step involves the removal of the
zero filters from the model. This is done at the model conversion step using :doc:`Model Optimizer <openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide>` tool when the model is converted from the framework representation (ONNX, TensorFlow, etc.) to OpenVINO Intermediate Representation.

* To remove zero filters from the pruned model add the following parameter to the model convertion command: ``--transform=Pruning``
* To remove zero filters from the pruned model add the following parameter to the model conversion command: ``--transform=Pruning``

After that the model can be deployed with OpenVINO in the same way as the baseline model.
After that, the model can be deployed with OpenVINO in the same way as the baseline model.
For more details about model deployment with OpenVINO, see the corresponding :doc:`documentation <openvino_docs_OV_UG_OV_Runtime_User_Guide>`.


Examples
####################

* `PyTorch Image Classiication example <https://github.com/openvinotoolkit/nncf/blob/develop/examples/torch/classification>`__
* `PyTorch Image Classification example <https://github.com/openvinotoolkit/nncf/blob/develop/examples/torch/classification>`__

* `TensorFlow Image Classification example <https://github.com/openvinotoolkit/nncf/tree/develop/examples/tensorflow/classification>`__

Expand Down
Loading

0 comments on commit 2513db5

Please sign in to comment.