-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rotation code should use intrinsics on platforms other than MSVC #14
Comments
IIRC x86intrin.h isn't really standard; probably better to use |
Well it is covered somewhat in the answer. None of them are standard by any stretch of the imagination, but it seems that |
Ugh. I'm pretty sure x86intrin.h isn't available on MSVC, probably other compilers as well. I guess we'll have to add checks on |
Yes, but that's the bread and butter of portable snippets, right? The work is done here so users don't have to do it :) |
Sure, but it means it will take some time to develop. Unfortunately it's not just a matter of choosing the right file name, we also have to figure out when each compiler added support for a particular function, and thanks to FWIW, while clang has |
You are right that clang up to the current version (4.0?) anyway doesn't seem to have any rotate intrinsic. I had seen it reported that they did, but it must have been an error or included from somewhere else. Here's a good summary of the current issue which is that there is no C/C++ form of rotate that is both (a) free of UB and (b) recognized by both gcc and clang as a rotate (hence emitting Perhaps a simple solution is to simply use the Or you could still use the intrinsic for gcc , which apparently started supporting it sometime after 4.4.7 and before or at 4.5.3. It's probably simple enough just to be conservative and start supporting it at 4.5.3 or later. Or since they seem to be declared as macros you could just use an #ifdef to detect it? I suppose it is possible that they won't always be declared like that though. |
I guess inline assembly for x86 on compilers other than GCC and MSVC would be best… clang supports ACLE so for ARM on clang we can use I just pushed a commit ( |
Mind if I re-purpose this issue to apply to all the MSVC intrinsics? There are a bunch of other functions which would benefit from the same analysis… for example, implementing |
I think using the C rotate idiom that
These translate directly into unadorned The code generated for the standalone function is ideal in
The two Nonw, in clang prior to 3.5 (back to 3.0, the earliest I could test on godbolt) it is still recognized, but the
An extra cycle of latency from the rotate count (which is usually not the important chain, as it is often constant or available early), but otherwise still very fast. In terms of code generation, inline asm has some issues (see point 6 here for example). The inline asm version compiles to the same good code as the C idiom (and never has the redundant
The semantics of the C version is understood and Let's turn off vectorization (aka magic) and just look at the non vectorized versions: Main C loop:
The compiler understood the constant 13 rotate, and also the semantics of the operation and unrolled the loop 4 times. It is able to use a memory RMW to do 4 rotates in 7 instructions, with a very low amount of total overhead (perhaps 1.5 fused uops per rotate). The asm loop:
It is 8 instructions and it does a single operation per loop. It can still inline the function, but it otherwise does a bad job. It will never be able to use the "immediate" form of In fact all of the above applies to You are certainly welcome to use this for the "more instrinsics" bug too, although now we have a lot of rotate specific stuff :p - much of the above doesn't apply to other intrinsics which don't have the "recognized idiom" feature. |
In C++20 there are |
As far as I can tell, compiler rotate intrinsics are only used on MSVC.
As mentioned here, however, most platforms offer those intrinsics at least for x86 in
<x86intrin.h>
, so it seems like they could be used when available.The text was updated successfully, but these errors were encountered: