Unrolled implementation for latency dense/conv layers #1014

calad0i · 2024-05-14T23:32:09Z

Description

Implement manually unrolled dense/conv layers for latency models. Switch to enable this optimization is not implemented.

Currently, by importing import hls4ml.optimization.fused_dotp.optimizer_pass.vitis, all Dense, Conv1/2D are unrolled for vivado/vitis backends.

Breaking changes

As this feature also resides in hls4ml/optimization, the #768 at that path is moved to a subdirectory.

Code added by hls4ml Optimization API [Part 1] #768 moved from hls4ml/optimization to hls4ml/optimization/dsp_aware_pruning.
np_config.enable_numpy_behavior() is removed as it silently changes tensorflow's global behavior which may cause issue in other codes (e.g., a < b, where a, b are tf.Tensors, may no longer work with that option (tested tf2.13).

As a result, compatibility with #809 is to be checked.

Type of change

New feature (non-breaking change which adds functionality)

Tests

Pending, will be added to HGQ tests.

Checklist

I have read the guidelines for contributing.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have installed and run pre-commit on the files I edited or added.
I have added tests that prove my fix is effective or that my feature works.

calad0i · 2024-10-08T01:14:32Z

Closing for deprecating algorithm. Will open another PR for the new one.

jmitrevs added this to the v1.1.0 milestone Jul 15, 2024

calad0i force-pushed the symbolic-codegen-pr branch 2 times, most recently from cfd4b60 to d7e6652 Compare July 22, 2024 06:40

calad0i marked this pull request as draft July 31, 2024 06:06

calad0i added 7 commits August 27, 2024 12:25

mv dsp_aware_pruning location

03a22d7

dsp aware pruning - rm numpy behavior global override

475157c

+ unrolled latency impl

d46f682

manifest fix

b6457f4

support pointwise conv layers

cec6a83

allow fully unrolled conv/multidense

28f51bd

allow disable full unroll and dsp offload

3cf2f2f

calad0i force-pushed the symbolic-codegen-pr branch from 379f68f to 3cf2f2f Compare August 28, 2024 00:32

calad0i added 8 commits September 6, 2024 13:14

better resource surrogate

ce32bd2

allow sat_sym in symbolic precision

34d947a

update unroll alg, add plain unroll, remove add+neg fuse in sym var

db08a05

update unroll alg again

ba25caf

more options in unroller

b05fe0c

dotpunroll: move all config to _global_config

cb6bd8e

fix

df80842

misc

ec233df

calad0i closed this Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unrolled implementation for latency dense/conv layers #1014

Unrolled implementation for latency dense/conv layers #1014

calad0i commented May 14, 2024

calad0i commented Oct 8, 2024

Unrolled implementation for latency dense/conv layers #1014

Unrolled implementation for latency dense/conv layers #1014

Conversation

calad0i commented May 14, 2024

Description

Breaking changes

Type of change

Tests

Checklist

calad0i commented Oct 8, 2024