Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hls4ml Optimization API [Part 2] #809

Merged

Conversation

bo3z
Copy link
Contributor

@bo3z bo3z commented Jun 13, 2023

Description

  • Second part of hls4ml Optimization API hls4ml Optimization API [Part 1] #768
  • Introduces Dense Unrolled layers, optimising multiplications with zero in Resource strategy with RF > 1
  • Introduces additional TCL scripts, to optimise zero BRAM blocks.

Type of change

Tests

  • Added a new test, test_dense_unrolled that verifies dense resource layers implement avoiding zero multiplications are correct
  • Comparison with "standard" Dense Resource will be shortly available in the (updated) PR hls4ml Optimization API [Part 1] #768.

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

@bo3z
Copy link
Contributor Author

bo3z commented Jun 13, 2023

I will add pre-commit additionally, last time I ran it, some tests were broken, so will add it a subsequent commit.

@bo3z bo3z requested a review from vloncar June 13, 2023 20:37
@jmduarte jmduarte added this to the v0.8.0 milestone Jun 15, 2023
@bo3z
Copy link
Contributor Author

bo3z commented Jun 16, 2023

This is ready for review, seems that pre-commit can re-arrange the order of includes in C++ header files and it could cause compilation error.

@bo3z bo3z force-pushed the hls4ml-optimization-api-part-2 branch from 20ed996 to 0f0adc4 Compare June 16, 2023 11:36
@bo3z bo3z mentioned this pull request Jun 16, 2023
8 tasks
@jmitrevs jmitrevs modified the milestones: v0.8.0, v1.0.0 Oct 20, 2023
@jmitrevs jmitrevs added the please test Trigger testing by creating local PR branch label Jan 8, 2024
@jmitrevs
Copy link
Contributor

jmitrevs commented Feb 7, 2024

We merged part 1. Should we merge part 2?

@vloncar
Copy link
Contributor

vloncar commented Feb 7, 2024

I'm reviewing it. Slowly 😃 . But it's next in line, then HGQ.

@jmitrevs jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels May 3, 2024
@jmitrevs
Copy link
Contributor

jmitrevs commented May 3, 2024

The pytest error is unrelated to the PR so from my side this can be merged. I'll let Vladimir give the last OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why switching the default impl to verilog here? What's the impact of the added optimization on firmware performance and synth time?

Copy link
Contributor Author

@bo3z bo3z Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any noticeable impact. There are 2 reasons this change was made:

  1. When you use VHDL with Resource strategy the synthesis report gives 0 BRAM (which cannot be the case at all, especially since Verilog shows the correct BRAM). I suspect this is something to do out-of-context synthesis in Vivado but I haven't figured out how to enable / disable it. This is explained more in: Vivado synthesis report - zero BRAM utilisation (OOC) #798

  2. It's "functionally" not correct to do co-simulation / validation with Verilog and synthesis with VHDL (since HLS always generates both formats anyway). Yes, if one is correct and the other is wrong then it's an issue with the HLS compiler and we can't do much about it. But my reasoning is that using we should test and "deploy" the same set of end-files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then this should probably go into a separate PR I guess, as it changs the behavior of the library while not using dsp aware pruning. How about let's isolate bo3z@e53ec9c into another PR?

newline += indent + '#pragma HLS DATAFLOW \n'

model_cfg = model.config.get_config_value('HLSConfig')['Model']
if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that if unrolled resource conv layers are used, dataflow will not be used. Also, if n_partition is not 1, the global II won't the be larger than the internal dense II (reuse_factor)?

@vloncar
Copy link
Contributor

vloncar commented Aug 25, 2024

This PR was refactored to introduce the new unrolled implementation as a "strategy", to be an alternative to existing latency and resource strategies. This allowed the the matrix-vector multiplication kernel to be used as a function, simplifying the integration with the rest of the code. The PR also has changes to the top pipeline style pragma, so the config now includes a new "auto" option (the default) which allows the optimizer to choose the best one. All of pipeline style decisions are now made in the new optimizer, instead of being scattered around the HLSConfig class and the backend.

One more minor change may come. Since we will have multiple new strategies and optimization options, it was suggested to give this optimization technique a name and move it to a submodule of that name. Discussion on this is welcome.

@vloncar vloncar added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Aug 25, 2024
@vloncar
Copy link
Contributor

vloncar commented Oct 7, 2024

All the comments from the last discussion were addressed (renamed the strategy and moved the files to a new submodule). It's now ready to be merged

@JanFSchulte JanFSchulte merged commit 352c124 into fastmachinelearning:main Oct 22, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
please test Trigger testing by creating local PR branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants