From c5b7d397ea0fd442ff7f55cbb6334cd9056812b1 Mon Sep 17 00:00:00 2001 From: Chang Sun Date: Fri, 26 Apr 2024 14:46:36 -0700 Subject: [PATCH] update docs --- README.md | 26 ++++++++++++++++++++------ docs/_static/custom.css | 3 +++ docs/_static/overview.svg | 1 + docs/conf.py | 4 ++++ docs/faq.md | 2 +- docs/getting_started.md | 4 ++-- docs/index.rst | 30 +++++++++++++++++++++--------- docs/install.md | 10 +--------- docs/reference.md | 2 +- 9 files changed, 54 insertions(+), 28 deletions(-) create mode 100644 docs/_static/custom.css create mode 100644 docs/_static/overview.svg diff --git a/README.md b/README.md index e9f787d..6281c99 100644 --- a/README.md +++ b/README.md @@ -8,14 +8,28 @@ [![PyPI version](https://badge.fury.io/py/hgq.svg)](https://badge.fury.io/py/hgq) -HGQ is a framework for quantization aware training of neural networks to be deployed on FPGAs, which allows for per-weight and per-activation bitwidth optimization. +HGQ is an gradient-based automatic bitwidth optimization and quantization-aware training algorithm for neural networks to be deployed on FPGAs, By laveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level. -Depending on the specific [application](https://arxiv.org/abs/2006.10159), HGQ could achieve up to 10x resource reduction compared to the traditional `AutoQkeras` approach, while maintaining the same accuracy. For some more challenging [tasks](https://arxiv.org/abs/2202.04976), where the model is already under-fitted, HGQ could still improve the performance under the same on-board resource consumption. For more details, please refer to our paper (link coming not too soon). +HGQ-overview -This repository implements HGQ for `tensorflow.keras` models. It is independent of the [QKeras project](https://github.com/google/qkeras). +Compare to the other heterogeneous quantization approach, like the QKeras counterpart, HGQ provides the following advantages: -## Warning: +- **High Granularity**: HGQ supports per-weight and per-activation bitwidth optimization, or any other lower granularity. +- **Automatic Quantization**: By setting a resource regularization term, HGQ could automatically optimize the bitwidth of all parameters during training. Pruning is performed naturally when a bitwidth is reduced to 0. +- **Bit-accurate conversion** to `hls4ml`: You get exactly what you get from `Keras` models from `hls4ml` models. HGQ provides a bit-accurate conversion interface, proxy models, for bit-accurate conversion to hls4ml models. + - still subject to machine float precision limitation. +- **Accurate Resource Estimation**: BOPs estimated by HGQ is roughly #LUTs + 55#DSPs for actual (post place & route) FPGA resource consumption. This metric is available during training, and one can estimate the resource consumption of the final model in a very early stage. -This framework requires an **unmerged** [PR](https://github.com/fastmachinelearning/hls4ml/pull/914) of hls4ml. Please install it by running `pip install "git+https://github.com/calad0i/hls4ml@HGQ-integration"`. Or, conversion will fail with unsupported layer error. +Depending on the specific [application](https://arxiv.org/abs/2006.10159), HGQ could achieve up to 20x resource reduction compared to the `AutoQkeras` approach, while maintaining the same accuracy. For some more challenging [tasks](https://arxiv.org/abs/2202.04976), where the model is already under-fitted, HGQ could still improve the performance under the same on-board resource consumption. For more details, please refer to our paper (link coming soon). -## This package is still under development. Any API might change without notice at any time! +## Installation + +You will need `python>=3.10` and `tensorflow>=2.13` to run this framework. You can install it via pip: + +```bash +pip install hgq +``` + +## Usage + +Please refer to the [documentation](https://calad0i.github.io/HGQ/) for more details. diff --git a/docs/_static/custom.css b/docs/_static/custom.css new file mode 100644 index 0000000..dc41c81 --- /dev/null +++ b/docs/_static/custom.css @@ -0,0 +1,3 @@ +img.light { + color-scheme: light; +} diff --git a/docs/_static/overview.svg b/docs/_static/overview.svg new file mode 100644 index 0000000..0b34ea1 --- /dev/null +++ b/docs/_static/overview.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/conf.py b/docs/conf.py index 9038aaf..a5e2eb9 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -66,3 +66,7 @@ html_theme = "sphinx_rtd_theme" html_static_path = ['_static'] html_favicon = '_static/icon.svg' + +html_css_files = [ + 'custom.css', +] diff --git a/docs/faq.md b/docs/faq.md index a913ef5..bceb96c 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -6,7 +6,7 @@ HGQ is a method for quantization aware training of neural works to be deployed o ## Why is it useful? -Depending on the specific [application](https://arxiv.org/abs/2006.10159), HGQ could achieve up to 10x resource reduction compared to the traditional `AutoQkeras` approach, while maintaining the same accuracy. For some more challenging [tasks](https://arxiv.org/abs/2202.04976), where the model is already under-fitted, HGQ could still improve the performance under the same on-board resource consumption. For more details, please refer to our paper (link coming not too soon). +Depending on the specific [application](https://arxiv.org/abs/2006.10159), HGQ could achieve up to 20x resource reduction compared to the traditional `AutoQkeras` approach, while maintaining the same accuracy. For some more challenging [tasks](https://arxiv.org/abs/2202.04976), where the model is already under-fitted, HGQ could still improve the performance under the same on-board resource consumption. For more details, please refer to our paper (link coming not too soon). ## Can I use it? diff --git a/docs/getting_started.md b/docs/getting_started.md index 3f2b219..7767244 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -1,7 +1,7 @@ # Quick Start -```{warning} -This guide is only for models with fully heterogeneous quantized weights. For models with partially-heterogeneous quantized weights, please refer to the [Full Usage](#Full Usage) guide. +```{note} +This guide is only for models with fully heterogeneous quantized weights (per-weight bitwidth). ``` ## Model definition & training diff --git a/docs/index.rst b/docs/index.rst index 316002d..4a76138 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -3,21 +3,33 @@ You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. +=========================== High Granularity Quantization -=============================================================== +=========================== -HGQ is a framework for quantization aware training of neural networks to be deployed on FPGAs, which allows for per-weight and per-activation bitwidth optimization. +.. image:: https://img.shields.io/badge/license-Apache%202.0-green.svg + :target: LICENSE +.. image:: https://github.com/calad0i/HGQ/actions/workflows/sphinx-build.yml/badge.svg + :target: https://calad0i.github.io/HGQ/ +.. image:: https://badge.fury.io/py/hgq.svg + :target: https://badge.fury.io/py/hgq -Depending on the specific application_, HGQ could achieve up to 10x resource reduction compared to the traditional AutoQkeras_ approach, while maintaining the same accuracy. For some more `challenging tasks`_, where the model is already under-fitted, HGQ could still improve the performance under the same on-board resource consumption. For more details, please refer to our paper (link coming not too soon). +HGQ is an gradient-based automatic bitwidth optimization and quantization-aware training algorithm for neural networks to be deployed on FPGAs, By laveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level. -This repository implements HGQ for `tensorflow.keras` models. It is independent of the `QKeras project`_. +.. rst-class:: light +.. image:: _static/overview.svg + :alt: HGQ-overview + :width: 600 -Notice: this repository is still under development, and the API might change in the future. +Compare to the other heterogeneous quantization approach, like the QKeras counterpart, HGQ provides the following advantages: -.. _application: https://arxiv.org/abs/2006.10159 -.. _AutoQkeras: https://arxiv.org/abs/2006.10159 -.. _challenging tasks: https://arxiv.org/abs/2202.04976 -.. _QKeras project: https://github.com/google/qkeras +- **High Granularity**: HGQ supports per-weight and per-activation bitwidth optimization, or any other lower granularity. +- **Automatic Quantization**: By setting a resource regularization term, HGQ could automatically optimize the bitwidth of all parameters during training. Pruning is performed naturally when a bitwidth is reduced to 0. +- **Bit-accurate conversion** to `hls4ml`: You get exactly what you get from `Keras` models from `hls4ml` models. HGQ provides a bit-accurate conversion interface, proxy models, for bit-accurate conversion to hls4ml models. + - still subject to machine float precision limitation. +- **Accurate Resource Estimation**: BOPs estimated by HGQ is roughly #LUTs + 55#DSPs for actual (post place & route) FPGA resource consumption. This metric is available during training, and one can estimate the resource consumption of the final model in a very early stage. + +Depending on the specific `application `_, HGQ could achieve up to 20x resource reduction compared to the `AutoQkeras` approach, while maintaining the same accuracy. For some more challenging `tasks `_, where the model is already under-fitted, HGQ could still improve the performance under the same on-board resource consumption. For more details, please refer to our paper (link coming soon). Index ========================================================= diff --git a/docs/install.md b/docs/install.md index 1cc807c..7f5edb4 100644 --- a/docs/install.md +++ b/docs/install.md @@ -1,19 +1,11 @@ # Installation -Use `pip install --pre HGQ` to install the latest version from PyPI. You will need a environment with `python>=3.10` installed. Currently, only `python3.10 and 3.11` are tested. +Use `pip install HGQ` to install the latest version from PyPI. You will need a environment with `python>=3.10` installed. Currently, only `python3.10 and 3.11` are tested. ```{warning} This framework requires an **unmerged** [PR](https://github.com/fastmachinelearning/hls4ml/pull/914) of hls4ml. Please install it by running `pip install "git+https://github.com/calad0i/hls4ml@HGQ-integration"`. Or, conversion will fail with unsupported layer error. ``` -```{note} -The current varsion requires an **unmerged** version of hls4ml. Please install it by running `pip install git+https://github.com/calad0i/hls4ml`. -``` - ```{warning} HGQ v0.2 requires `tensorflow>=2.13,<2.16` (tested on 2.13 and 2.15; 2.16 untested but may work) and `python>=3.10`. Please make sure that you have the correct version of python and tensorflow installed. ``` - -```{warning} -Due to broken dependency declaration, you will need to specify the version of tensorflow manually. Otherwise, there will likely to be version conflicts. -``` diff --git a/docs/reference.md b/docs/reference.md index 94d0a46..ec707b0 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -53,7 +53,7 @@ Heterogenerous layers (`H-` prefix): - (New in 0.2) `HActivation` with **arbitrary unary function**. (See note below.) ```{note} -`HActivation` will be converted to a general `unary LUT` in `to_proxy_model` when +`HActivation` will be converted to a general `unaryLUT` in `to_proxy_model` when - the required table size is smaller or equal to `unary_lut_max_table_size`. - the corresponding function is not `relu`.