In addition to changes to the previous R2.AC2 release, this release switches to use clang to build the release. Because of clang's advanced vectorizer, this release is 25% faster compared to R2.AC2 release 1, and the performance gap between this release and neo-FFT3D on QTGMC(Preset='Very Slow')
is negligible, without using SIMD intrinsics.
Special Notes: this release requires CPU to support at least AVX+FMA+BMI1 extensions (i.e. Haswell or better).
-
Measured on VS R57, Skylake Xeon, Windows Server 2019. (
bt=3
) ↩