-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update README.md for v1.0.0 #1100
Merged
Merged
Changes from all commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
cfad81f
Fix wrong note in README.md
bo3z d422659
Merge branch 'main' into update-readme
jmitrevs fabcf8c
update the project status
jmitrevs b844acf
restructure of existing documentation
jmitrevs 88e84f3
add an internal layers section, and auto precision
jmitrevs 6abc8ad
pre-commit fixes
jmitrevs 7570c11
Merge remote-tracking branch 'upstream/main' into update-docs
vloncar 09bbefb
Typo fixes
vloncar 42cb368
Add video tutorial link
bo3z 26f4eb2
Merge branch 'main' into update-readme
jmitrevs fedf790
respond to some review comments and update some descriptions
jmitrevs f28f364
fix documentation of channels_last conversion for pytorch
JanFSchulte e55b29c
slightly expand discussion of channels_last in pytorch
JanFSchulte 99e3be0
update requirements
jmduarte 96b530f
add pointwise documentation
jmduarte a7b6f79
update pointwise description
jmduarte 135eaa2
Merge remote-tracking branch 'upstream/main' into update-readme
vloncar 6af7fef
Add FAQ to docs and readme
vloncar eac61dd
Nicer link to the tutorial
vloncar c65e915
add doc strings to pytorch-specific padding calculation functions
JanFSchulte 7cf4134
Merge branch 'update-readme' of https://github.com/fastmachinelearnin…
JanFSchulte 4fc1ea9
clarify default for channels last conversion in pytorch
JanFSchulte 548c462
Restructure documentation
vloncar 4da52a4
bump version to 1.0.0
jmduarte 6959c71
remove obsolete file references
jmitrevs 47d7435
add a touch of text on the backends
jmitrevs 05f8a45
expand pytorch frontend documentation
JanFSchulte 6f971eb
Merge branch 'main' into update-readme
JanFSchulte 536c069
[pre-commit.ci] auto fixes from pre-commit hooks
pre-commit-ci[bot] d9d09e0
typos in pytorch frontend documentation
JanFSchulte 16c4055
Merge branch 'update-readme' of https://github.com/fastmachinelearnin…
JanFSchulte e69a392
improve description of brevtias -> QONNX -> hlsm4l workflow
JanFSchulte 896951a
Add docs on BramFactor
vloncar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
============================= | ||
Automatic precision inference | ||
============================= | ||
|
||
The automatic precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.infer_precision.InferPrecisionTypes`) attempts to infer the appropriate | ||
widths for a given precision. It is initiated by setting a precision in the configuration as ``'auto'``. (Note, only layer-level precisions can be set to ``'auto'``, | ||
not model-level.) Functions like :py:class:`~hls4ml.utils.config.config_from_keras_model`, :py:class:`~hls4ml.utils.config.config_from_onnx_model`, | ||
and :py:class:`~hls4ml.utils.config.config_from_pytorch_model` automatically set most precisions to ``'auto'`` if the ``'name'`` granularity is used. | ||
|
||
.. note:: | ||
It is recommended to pass the backend to the ``config_from_*`` functions so that they can properly extract all the configurable precisions. | ||
|
||
The approach taken by the precision inference is to set accumulator (the internal variable used to accumulate values in the matrix multiplications) and other precisions | ||
to never truncate, using only the bitwidths of the inputs (not the values). This is quite conservative, especially in cases where post-training quantization is used, or | ||
if the bit widths were set fairly loosely. The recommended action in that case is to edit the configuration and explicitly set some widths in it, potentially in an iterative process | ||
after profiling the data. Another option is to pass a maximum precision using the ``max_precison`` parameter of the ``config_form_*`` functions. Then the automatic precision | ||
inference will never set a bitwdith larger than the bitwidth of the ``max_precision`` or an integer part larger than the integer part of the ``max_precision`` that is passed. | ||
(The bitwidth and integer parts of the ``max_precision`` are treated separately.) | ||
|
||
When manually setting bitdwidths, the accumulator can overflow, and the precision may need to be reduced. For the accumulator, it is usually a bad idea to explicitly | ||
enable rounding or saturation modes since it dramatically increases the execution time. For other types (e.g. output types or weight types), however, rounding and saturation handling | ||
can be enabled as needed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
================================== | ||
Loading weights from external BRAM | ||
================================== | ||
|
||
.. note:: | ||
This feature is being evaluated for re-implementation. We welcome feedback from users how to make the implementation more flexible. | ||
|
||
``hls4ml`` can optionally store weights in BRAMs external to the design. This is supported in Vivado/Vitis and Catapult backends. It is the responsibility of the user to ensure the weights are properly loaded during the operation of the design. | ||
|
||
The feature works as a threshold, exposed through a ``BramFactor`` config parameter. Layers with more weights above the threshold will be exposed as BRAM interface. Consider the following code: | ||
|
||
.. code-block:: Python | ||
|
||
model = tf.keras.models.Sequential() | ||
model.add(Dense(10, activation="relu", input_shape=(12,), name="dense_1")) | ||
model.add(Dense(20, activation="relu", name="dense_2")) | ||
model.add(Dense(5, activation="softmax", name="dense_3")) | ||
model.compile(optimizer='adam', loss='mse') | ||
|
||
config = hls4ml.utils.config_from_keras_model(model) | ||
config["Model"]["Strategy"] = "Resource" | ||
config["Model"]["BramFactor"] = 100 | ||
|
||
hls_model = hls4ml.converters.convert_from_keras_model( | ||
model, hls_config=config, output_dir=output_dir, io_type=io_type, backend=backend | ||
) | ||
|
||
Having set ``BramFactor=100``, only layers with more than 100 weights will be exposed as external BRAM, in this case layers ``dense_1`` and ``dense_2``. ``BramFactor`` can currently be only set at the model level. The generated code will now have weights as part of the interface. | ||
|
||
.. code-block:: C++ | ||
|
||
void myproject( | ||
hls::stream<input_t> &dense_1_input, | ||
hls::stream<result_t> &layer7_out, | ||
model_default_t w2[120], | ||
model_default_t w4[200] | ||
) { | ||
#pragma HLS INTERFACE axis port=dense_1_input,layer7_out | ||
#pragma HLS INTERFACE bram port=w2,w4 | ||
... | ||
|
||
When integrating the design, users can use the exposed interface to implement weight reloading scheme. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
======== | ||
Concepts | ||
======== | ||
|
||
How it Works | ||
---------------------- | ||
|
||
.. image:: ../img/nn_map_paper_fig_2.png | ||
:width: 70% | ||
:align: center | ||
|
||
|
||
Consider a multilayer neural network. At each neuron in a layer :math:`m` (containing :math:`N_m` neurons), we calculate an output value (part of the output vector :math:`\mathbf{x}_m` of said layer) using the sum of output values of the previous layer multiplied by independent weights for each of these values and a bias value. An activation function is performed on the result to get the final output value for the neuron. Representing the weights as a :math:`N_m` by :math:`N_{m-1}` matrix :math:`W_{m,m-1}`, the bias values as :math:`\mathbf{b}_m`, and the activation function as :math:`g_m`, we can express this compactly as: | ||
|
||
|
||
.. math:: | ||
|
||
\mathbf{x}_m = g_m (W_{m,m-1} \mathbf{x}_{m-1} +\mathbf{b}_m) | ||
|
||
With hls4ml, each layer of output values is calculated independently in sequence, using pipelining to speed up the process by accepting new inputs after an initiation interval. | ||
The activations, if nontrivial, are precomputed. | ||
|
||
To ensure optimal performance, the user can control aspects of their model, principally: | ||
|
||
|
||
* **Size/Compression** - Though not explicitly part of the ``hls4ml`` package, this is an important optimization to efficiently use the FPGA resources | ||
* **Precision** - Define the :doc:`precision <../advanced/profiling>` of the calculations in your model | ||
* **Dataflow/Resource Reuse** - Control parallel or streaming model implementations with varying levels of pipelining | ||
* **Quantization Aware Training** - Achieve best performance at low precision with tools like QKeras, and benefit automatically during inference with ``hls4ml`` parsing of QKeras models | ||
|
||
|
||
.. image:: ../img/reuse_factor_paper_fig_8.png | ||
:width: 70% | ||
:align: center | ||
|
||
|
||
Often, these decisions will be hardware dependent to maximize performance. | ||
Of note is that simplifying the input network must be done before using ``hls4ml`` to generate HLS code, for optimal compression to provide a sizable speedup. | ||
Also important to note is the use of fixed point arithmetic in ``hls4ml``. | ||
This improves processing speed relative to floating point implementations. | ||
The ``hls4ml`` package also offers the functionality of configuring binning and output bit width of the precomputed activation functions as necessary. With respect to parallelization and resource reuse, ``hls4ml`` offers a "reuse factor" parameter that determines the number of times each multiplier is used in order to compute a layer of neuron's values. Therefore, a reuse factor of one would split the computation so each multiplier had to only perform one multiplication in the computation of the output values of a layer, as shown above. Conversely, a reuse factor of four, in this case, uses a single multiplier four times sequentially. Low reuse factor achieves the lowest latency and highest throughput but uses the most resources, while high reuse factor save resources at the expense of longer latency and lower throughput. | ||
|
||
|
||
Frontends and Backends | ||
---------------------- | ||
|
||
``hls4ml`` has a concept of a **frontend** that parses the input NN into an internal model graph, and a **backend** that controls | ||
what type of output is produced from the graph. Frontends and backends can be independently chosen. Examples of frontends are the | ||
parsers for Keras or ONNX, and examples of backends are Vivado HLS, Intel HLS, and Vitis HLS. See :ref:`Status and Features` for the | ||
currently supported frontends and backends or the dedicated sections for each frontend/backend. | ||
|
||
|
||
I/O Types | ||
--------- | ||
|
||
``hls4ml`` supports multiple styles for handling data transfer to/from the network and between layers, known as the ``io_type``. | ||
|
||
io_parallel | ||
^^^^^^^^^^^ | ||
In this processing style, data is passed in parallel between the layers. Conceptually this corresponds to the C/C++ array where all elements can be accessed ay any time. This style allows for maximum parallelism and is well suited for MLP networks and small CNNs which aim for lowest latency. Due to the impact of parallel processing on resource utilization on FPGAs, the synthesis may fail for larger networks. | ||
|
||
io_stream | ||
^^^^^^^^^ | ||
As opposed to the parallel processing style, in ``io_stream`` mode data is passed one "pixel" at a time. Each pixel is an array of channels, which are always sent in parallel. This method for sending data between layers is recommended for larger CNN and RNN networks. For one-dimensional ``Dense`` layers, all the inputs are streamed in parallel as a single array. | ||
|
||
With the ``io_stream`` IO type, each layer is connected with the subsequent layer through first-in first-out (FIFO) buffers. | ||
The implementation of the FIFO buffers contribute to the overall resource utilization of the design, impacting in particular the BRAM or LUT utilization. | ||
Because the neural networks can have complex architectures generally, it is hard to know a priori the correct depth of each FIFO buffer. | ||
By default ``hls4ml`` choses the most conservative possible depth for each FIFO buffer, which can result in a an unnecessary overutilization of resources. | ||
|
||
In order to reduce the impact on the resources used for FIFO buffer implementation, we have a FIFO depth optimization flow. This is described | ||
in the :ref:`FIFO Buffer Depth Optimization` section. | ||
|
||
|
||
Strategy | ||
--------- | ||
|
||
**Strategy** in ``hls4ml`` refers to the implementation of core matrix-vector multiplication routine, which can be latency-oriented, resource-saving oriented, or specialized. Different strategies will have an impact on overall latency and resource consumption of each layer and users are advised to choose based on their design goals. The availability of particular strategy for a layer varies across backends, see the :doc:`Attributes <../ir/attributes>` section for a complete list of available strategies per-layer and per-backend. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to also add this to the documentation, not just the readme?