Update Verilator to version 5.006 #158

colluca · 2024-07-08T14:14:38Z

General

Addresses issue Support Verilator 5 #75.
Replaces Switch to Verilator 5 and update other dependencies #76 after previously updating our Docker container to Ubuntu 22.04 in container: Update to Ubuntu 22.04 #157.
Also attempts to build a multi-threaded Verilator model, fixing the issues observed in target: Add VLT_NUM_THREADS to configure threads used by verilator #148.

Details

Add --timing to support timing constructs e.g. as will be present in iDMA tracer.
Add VLT_JOBS option to control the number of threads, useful to keep memory usage under control.
Verilator 5.006 seems to require more memory than 4.110, specifically >16GB also with a single job, causing the Github CI to fail. We attempted hierarchical Verilation on the Snitch core complex but this fails due to missing support for type parameters (see Support 'parameter type' in hierarchical blocks verilator/verilator#5309). To solve this issue we provide the github-ci.hjson config with a reduced core count (4 cores) for use in the Github CI.
The out-of-memory failure occurred only with the default HW config default.hjson, and not with fdiv.hjson. In any case it's best to align all configurations to the default, so this is included as part of the PR.
Generalize all tests to work with an arbitrary number of cores (>4).
Temporarily disable ATAX test, to be fixed in a separate commit.

zero9178 · 2024-07-09T14:42:41Z

I tested this PR in our downstream docker image and testsuite (https://github.com/opencompl/Quidditch/blob/main/runtime/toolchain/Dockerfile) in part to evaluate whether this would then also close #148.

Few things sadly happened as soon as I used the newer SHA with the newer verilator release:

The verilated model seems to crash immediately if compiled with clang even on the newest clang versions (tested 14 and 18.1). I suspected a stack overflow initially but it also crashes with an unlimited stack, so I am not sure.
It would be a shame to loose clang as using clang previously cut our simulation time by 1/5. Applying PGO to the verilator build further reduced simulation time by a further 50% (tho this could also be done with GCC).
I see the same behaviour in Verilator 5.026.
Even when building with GCC (we are running with GCC 12.1, tad newer than the GCC 11.1 in Ubuntu 22.04), half the tests in snitch_cluster are failing without yet using VLT_NUM_THREADS. I haven not yet further looked into why. I might also be doing something wrong though as it seems to work on our executables. I also noticed that the log files have different names now (trace_hart_0000x.dasm) so maybe this is also the cause.
Compiling it with 9 threads causes a bit of a slowdown on single threaded executables (not unexpected). I am running tests now that are mostly multi threaded (as in all cores of the cluster) workloads now.

zero9178 · 2024-07-09T15:31:58Z

So the good news is that the new verilator model is a lot faster running a whole NN that leverages all 9 cores of a snitch cluster. Previously it took 5076 seconds; which has now been reduced to 2200 seconds, i.e. more than 50%.

I used VLT_NUM_THREADS=9, thinking that is a reasonable inherent parallism of the verilator model but a more optimal one may exist (our machine have 16 threads to play with). For GitHub actions we would more likely scale this back to 4 at most probably.

The correctness issue in #148 seems to have also been fixed

colluca · 2024-07-11T12:14:29Z

@zero9178 thanks for the detailed testing and report!

The last results look promising, so I would close #148 in favour of this. Did you manage to test this with GCC 11.1 in the end? Or were you able to fix the previous errors?

Regarding the log names this should be due to #58, but anyways the traces shouldn't affect the correctness of any of the software tests.

zero9178 · 2024-07-15T11:35:40Z

@zero9178 thanks for the detailed testing and report!

The last results look promising, so I would close #148 in favour of this. Did you manage to test this with GCC 11.1 in the end? Or were you able to fix the previous errors?

I sadly haven't gotten around to doing this yet and I am currently short on time, but will do so as soon as possible.
(Hopefully by next week when we're in Zurich 🙂)

colluca · 2024-07-15T11:46:38Z

I sadly haven't gotten around to doing this yet and I am currently short on time, but will do so as soon as possible. (Hopefully by next week when we're in Zurich 🙂)

Looking forward to meeting you and we can surely discuss this in that setting as well :)

zero9178 · 2024-07-23T12:20:15Z

I believe I have found and fixed the issue that affected Clang and potentially the newer GCC versions as well. It seemed to have been a stack overflow after all. Culprit are the files/functions that verilator designates as Slow and therefore compiled with -O0. Since neither GCC nor Clang performaned any optimizations, code in these files had excessive stack usage. Adding OPT_SLOW="-O1" to my make invocation when building bin/snitch_cluster.vlt fixes it and made the clang built verilator also pass every test in the repo.

colluca · 2024-07-23T14:03:40Z

I believe I have found and fixed the issue that affected Clang and potentially the newer GCC versions as well. It seemed to have been a stack overflow after all. Culprit are the files/functions that verilator designates as Slow and therefore compiled with -O0. Since neither GCC nor Clang performaned any optimizations, code in these files had excessive stack usage. Adding OPT_SLOW="-O1" to my make invocation when building bin/snitch_cluster.vlt fixes it and made the clang built verilator also pass every test in the repo.

Great news! 🎉 I will proceed with the merge then :)

fischeti

Very nice! thanks a lot. The CI should pass again with #172

Using more threads in the simulation may lead to a speedup in simulation time, especially when the program running in the simulator makes use of multiple cores. This PR therefore adds the `VLT_NUM_THREADS` option to `make` to allow users to set the thread number used in the simulator. For backwards compatibility, and since the number of threads is environment and code dependent, the default value of 1 was kept.

This was referenced Jul 8, 2024

Switch to Verilator 5 and update other dependencies #76

Closed

target: Add VLT_NUM_THREADS to configure threads used by verilator #148

Closed

colluca mentioned this pull request Jul 11, 2024

Support Verilator 5 #75

Closed

colluca force-pushed the update-verilator branch from 720c1d7 to e5f7611 Compare July 11, 2024 13:47

fischeti mentioned this pull request Jul 19, 2024

hw: Add native bootrom #168

Open

2 tasks

colluca force-pushed the update-verilator branch from e5f7611 to 3ca593a Compare July 22, 2024 07:06

colluca force-pushed the update-verilator branch from 3ca593a to 40cff28 Compare July 23, 2024 14:04

colluca marked this pull request as ready for review July 23, 2024 14:26

colluca requested review from paulsc96 and fischeti as code owners July 23, 2024 14:26

colluca force-pushed the update-verilator branch from 40cff28 to 257229e Compare July 26, 2024 16:05

colluca requested a review from lucabertaccini as a code owner July 26, 2024 23:04

colluca force-pushed the update-verilator branch 2 times, most recently from 8df70ab to 54aeece Compare July 27, 2024 14:44

colluca requested a review from viv-eth as a code owner July 27, 2024 15:35

colluca force-pushed the update-verilator branch from 2187057 to 2362ebb Compare July 28, 2024 10:01

fischeti approved these changes Jul 30, 2024

View reviewed changes

colluca and others added 3 commits August 1, 2024 10:15

treewide: Update Verilator to version 5.006

a564f88

target: Align alternative HW configs to default

1bd2c98

colluca force-pushed the update-verilator branch from 2362ebb to 3fa7ad7 Compare August 1, 2024 08:15

colluca added 2 commits August 1, 2024 11:46

treewide: Enable timing constructs in Verilator

4f6ed58

target: Control number of Verilator jobs

07089cd

colluca added 5 commits August 1, 2024 11:49

target: Clean-up and separate Verilator targets

ee76ee7

ci: Reduce core count for Verilator model

c97a5f2

sw: Make tests agnostic of core num

d4f2802

sw: Temporarily disable broken atax test

a172aa3

README: Remove Verilator version

Loading
Loading status checks…

57d1f48

colluca force-pushed the update-verilator branch from 3fa7ad7 to 57d1f48 Compare August 1, 2024 09:55

colluca merged commit 51c0300 into main Aug 1, 2024
23 of 25 checks passed

colluca deleted the update-verilator branch August 1, 2024 10:35

colluca mentioned this pull request Aug 1, 2024

fcvt.s.d instruction not working with SSR #116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Verilator to version 5.006 #158

Update Verilator to version 5.006 #158

colluca commented Jul 8, 2024 •

edited

Loading

zero9178 commented Jul 9, 2024

zero9178 commented Jul 9, 2024 •

edited

Loading

colluca commented Jul 11, 2024

zero9178 commented Jul 15, 2024

colluca commented Jul 15, 2024

zero9178 commented Jul 23, 2024

colluca commented Jul 23, 2024

fischeti left a comment

Update Verilator to version 5.006 #158

Update Verilator to version 5.006 #158

Conversation

colluca commented Jul 8, 2024 • edited Loading

General

Details

zero9178 commented Jul 9, 2024

zero9178 commented Jul 9, 2024 • edited Loading

colluca commented Jul 11, 2024

zero9178 commented Jul 15, 2024

colluca commented Jul 15, 2024

zero9178 commented Jul 23, 2024

colluca commented Jul 23, 2024

fischeti left a comment

Choose a reason for hiding this comment

colluca commented Jul 8, 2024 •

edited

Loading

zero9178 commented Jul 9, 2024 •

edited

Loading