[HW] 🐛 Bug fixes #308

mp-17 · 2024-06-19T16:23:15Z

#270 rework

This PR aims to introducing:

Various bug fixes (including exception generation and handling + reshuffle mechanism) .
Basic support for RVV CSR vstart.

Changelog

Fixed

Stall Ara and wait for ara_idle upon CSR write/read
Fix how VLSU reports exceptions
Set vstart=0 upon succesfull vector instructions
Fix vstart usage in ara dispatcher
Fix reshuffle mechanism

Changed

Rename CSRs in dispatcher to improve clarity
Change from reporting "errors" to full "exceptions"
Bump CVA6 to version that supports "exceptions" reporting

Checklist

Automated tests pass
Changelog updated
Code style guideline is observed

- ara_idle_i is asserted after two cycles from an instruction in the dispatcher. Therefore, we cannot use it to provide full consistency for CSR reads/writes. - Also ara_idle_i comes from part of the scoreboard in the sequencer. It's better not to tie it to the out path to CVA6 since it's delicate already.

If LMUL_X has X > 1, Ara injects one reshuffle at a time for each register within Vn and V(n+X-1) that has an EEW mismatch. All these reshuffles are reshuffling different Vm with LMUL_1, but also the same register (Vn with LMUL_X) from the point of view of the hazard checks on the next instruction that has a dependency on Vn with LMUL_X. We cannot just inject one macro reshuffle since the registers between Vn and V(n+X-1) can have different encodings. So, we need finer-grain reshuffles that messes up the dependency tracking. For example, vst @, v0 (LMUL_8) will use the registers from v0 to v7. If they are all reshuffled, we will end up with 8 reshuffle instructions that will get IDs from 0 to 7. The store will then see a dependency on the reshuffle ID that targets v0 only. This is wrong, since if the store opreq is faster than the slide opreq once the v0-reshuffle is over, it will violate the RAW dependency. Not to mess this up, the safest and most suboptimal fix is to just wait in WAIT_IDLE after a reshuffle with LMUL > 1. There are many possible optimizations to this: 1) Check if, when LMUL > 1, we reshuffled more than 1 register. If we reshuffle 1 reg only, we can also skip the WAIT_IDLE. 2) Check if all the X registers need to be reshuffled (common case). If this is the case, inject a large reshuffle with LMUL_X only and skip WAIT_IDLE. 3) Not to wait until idle, instead of WAIT_IDLE we can inject the reshuffles starting from V(n+X-1) instead than Vn. This will automatically adjust the dependency check and will speed up a bit the whole operation.

mp-17 mentioned this pull request Jun 19, 2024

[Draft] ✨ 🐛 Bug fixes and vstart CSR support #270

Closed

3 tasks

MaistoV added 2 commits June 27, 2024 13:40

Rename CSRs in ara dispatcher

8d4fd75

Stall Ara upon operations on vector CSRs

9591d0d

mp-17 force-pushed the mp/fixes branch from 2fed184 to 3f6089b Compare June 27, 2024 16:55

mp-17 and others added 11 commits June 27, 2024 19:41

Change "errors" to "exceptions"

908416c

Extend and fix Ara exception reporting from VLSU

aa4320b

Set vstart=0 for succesful vector instructions

31c50f8

[hardware] Fix vstart handling in dispatcher

68f3a87

[hardware] 🐛 Fix reshuffling bug in dispatcher

259fcae

[hardware] 🐛 Fix eew_q update during reshuffle

5128727

[hardware] 🐛 Fix reshuffle

e34a24d

[hardware] 🐛 Consider LMUL when deciding if to reshuffle vd

f302889

[hardware] Bump CVA6

4b5d31a

mp-17 force-pushed the mp/fixes branch from 3f6089b to 4b5d31a Compare June 27, 2024 17:41

[CHANGELOG] Update Changelog

e2b474d

mp-17 merged commit b4aa64e into main Jun 28, 2024
13 checks passed

mp-17 deleted the mp/fixes branch June 28, 2024 12:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HW] 🐛 Bug fixes #308

[HW] 🐛 Bug fixes #308

mp-17 commented Jun 19, 2024 •

edited

Loading

[HW] 🐛 Bug fixes #308

[HW] 🐛 Bug fixes #308

Conversation

mp-17 commented Jun 19, 2024 • edited Loading

Changelog

Fixed

Changed

Checklist

mp-17 commented Jun 19, 2024 •

edited

Loading