-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add option for .w
suffix boilerplating of all compressible instructions
#83
Comments
Hey @jnk0le, thanks for bringing this to our attention! In the development and tuning of the model, I also figured that we -- probably -- want However, this would require more changes than just in the printing as the scheduling properties of the expanded instruction may be different, as well as more possibilties for the register renaming should be taken into account, i.e., what's currently modeled as, e.g., |
Didn't spot such behaviour on M4/M7. Only the things like compiler preferring "shifted constant" over encoding T4 (better issuing) or having to chose between That should be a thing on CM33 or CM55 though. (M85 can tripple issue nops and branches but that's independent of offending instruction size) |
Thanks for your input on that matter!
I agree, just did not want to exclude that this case could come up. On, e.g., M85 this could matter though. From the Software Optimization Guide: "The latency from the shifter source operand is 2, regardless of whether the shift immediate value is non-zero or not." This means, using |
seems to be the case on chained (3-4+) dependency, otherwise stall is somehow folded by early/late ALU. |
related to #61, as I already spotted some instances of compressible but
.w
instructions used in some inputs, "for no reason".Certain microarchitectures may suffer performance degradation due to the use of compressed instructions.
In order to avoid it and resulting false positives/negatives in benchmarking, all instructions need to be forced into uncompressed form (i.e. boilerplated with
.w
suffix.)To not bother the "naive" writers it needs to be handled by the slothy via config.
cortex-m7:
For maximum ipc, all instructions need to be uncompressed and one needs forget about load/store double/multiple.
(no further penalties after "normal" stalls)
Of course it is possible to compress (and use CISCy instructions) without penalties but I couldn't figure out the exact pattern
and trial&error probing on HW is too much for superoptimizer.
cortex-m3/4:
.w
loads needs to be aligned (instruction bits) at word boundaries or will fail to pipeline.(Shwabe&Stoffelen aes work, went for "all uncompressed" way)
The text was updated successfully, but these errors were encountered: