Release v4.0 · intel/x86-simd-sort

v4.0 is a significant release with new features and improvements. AVX-512 sorting methods gain up to 2x perf improvements and we have added AVX2 sorting methods to support a wider range of x86 processors. In additional to using it as a header file library, x86-simd-sort can be installed as a library, and it provides API access to various sorting methods with automatic runtime dispatch to select the fastest version based on the processor. Here is a quick summary of all the changes:

Added AVX2 implementations of avx2_qsort, avx2_qselect and avx2_partial_qsort for 32-bit and 64-bit data types. When compared to std::sort, these are up to 12x faster for 32-bit data and up to 7x faster for 64-bit data.
x86-simd-sort can now be built and installed as a shared library. The library provides runtime dispatch and automatically picks the fastest version among AVX-512/AVX2/scalar depending on the processor it is run on. Starting with clearlinux v40270, you can install x86-simd-sort with swupd bundle-add x86-simd-sort.
Perf improvements to avx512_qsort: 2x speed up for 32-bit data, 1.5x speed up for 64-bit data and 1.25x speed up for 16-bit data.
Perf improvements to avx512_argsort and avx512_argselect intended to mitigate the effect of a vulnerability in gather instruction.

What's Changed

Changes quicksort and quickselect to use template based sorting networks by @sterrettm2 in #61
Correct documentation on NaN behavior by @zbjornson in #73
update CI that builds NumPy by @r-devulap in #79
Fix MSVC build error by @sterrettm2 in #76
Use scalar emulation of gather instruction for arg methods by @r-devulap in #65
Bug fix in avx512_qselect_fp16 by @r-devulap in #80
Build shared library with runtime ISA dispatch by @r-devulap in #74
CI: Do not install google benchmarks and use different compiler versions by @r-devulap in #82
Add option to compare benchmarks across branches by @r-devulap in #84
Use env var to disable/enable avx512 dispatch by @r-devulap in #85
Various performance improvements by @sterrettm2 in #83
Fix NumPy CI failures by @r-devulap in #86
CI: numpy/core is renamed to numpy/_core by @r-devulap in #87
Enable argsort and argselect methods on 32-bit platforms by @r-devulap in #88
Disable prefetch on clang-cl WIN32 by @r-devulap in #89
Explicitly initialize with zmm_vector<uint64_t> for arg methods on macOS by @r-devulap in #90
Reorganize code and add examples to build by @r-devulap in #91
Adds support for AVX2 for 32-bit types for quicksort and quickselect by @sterrettm2 in #60
Add hasnan = false to all the sort methods by @r-devulap in #92
Provide a way to install x86simdsort as a library by @r-devulap in #94
AVX2 64-bit support by @sterrettm2 in #93
Add soversion to the lib by @r-devulap in #95

New Contributors

@sterrettm2 made their first contribution in #61
@zbjornson made their first contribution in #73

Full Changelog: v3.0...4.0.rc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.0

What's Changed

New Contributors

Contributors