Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HW] 🐛 Bug fixes #308

Merged
merged 14 commits into from
Jun 28, 2024
Merged

[HW] 🐛 Bug fixes #308

merged 14 commits into from
Jun 28, 2024

Conversation

mp-17
Copy link
Collaborator

@mp-17 mp-17 commented Jun 19, 2024

#270 rework

This PR aims to introducing:

Various bug fixes (including exception generation and handling + reshuffle mechanism) .
Basic support for RVV CSR vstart.

Changelog

Fixed

  • Stall Ara and wait for ara_idle upon CSR write/read
  • Fix how VLSU reports exceptions
  • Set vstart=0 upon succesfull vector instructions
  • Fix vstart usage in ara dispatcher
  • Fix reshuffle mechanism

Changed

  • Rename CSRs in dispatcher to improve clarity
  • Change from reporting "errors" to full "exceptions"
  • Bump CVA6 to version that supports "exceptions" reporting

Checklist

  • Automated tests pass
  • Changelog updated
  • Code style guideline is observed

mp-17 and others added 11 commits June 27, 2024 19:41
 - ara_idle_i is asserted after two cycles from an instruction in the
   dispatcher. Therefore, we cannot use it to provide full consistency
   for CSR reads/writes.
 - Also ara_idle_i comes from part of the scoreboard in the sequencer.
   It's better not to tie it to the out path to CVA6 since it's delicate
   already.
If LMUL_X has X > 1, Ara injects one reshuffle at a time for each register
within Vn and V(n+X-1) that has an EEW mismatch.
All these reshuffles are reshuffling different Vm with LMUL_1, but also
the same register (Vn with LMUL_X) from the point of view of the hazard
checks on the next instruction that has a dependency on Vn with LMUL_X.

We cannot just inject one macro reshuffle since the registers between
Vn and V(n+X-1) can have different encodings. So, we need finer-grain
reshuffles that messes up the dependency tracking.

For example,
vst @, v0 (LMUL_8)
will use the registers from v0 to v7. If they are all reshuffled, we
will end up with 8 reshuffle instructions that will get IDs from
0 to 7. The store will then see a dependency on the reshuffle ID that
targets v0 only. This is wrong, since if the store opreq is faster than
the slide opreq once the v0-reshuffle is over, it will violate the RAW
dependency.

Not to mess this up, the safest and most suboptimal fix is to just
wait in WAIT_IDLE after a reshuffle with LMUL > 1.

There are many possible optimizations to this:
 1) Check if, when LMUL > 1, we reshuffled more than 1 register.
If we reshuffle 1 reg only, we can also skip the WAIT_IDLE.
 2) Check if all the X registers need to be reshuffled (common case).
If this is the case, inject a large reshuffle with LMUL_X only and
skip WAIT_IDLE.
 3) Not to wait until idle, instead of WAIT_IDLE we can inject the
reshuffles starting from V(n+X-1) instead than Vn. This will automatically
adjust the dependency check and will speed up a bit the whole operation.
@mp-17 mp-17 merged commit b4aa64e into main Jun 28, 2024
13 checks passed
@mp-17 mp-17 deleted the mp/fixes branch June 28, 2024 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants