v4.0
v4.0 is a significant release with new features and improvements. AVX-512 sorting methods gain up to 2x perf improvements and we have added AVX2 sorting methods to support a wider range of x86 processors. In additional to using it as a header file library, x86-simd-sort can be installed as a library, and it provides API access to various sorting methods with automatic runtime dispatch to select the fastest version based on the processor. Here is a quick summary of all the changes:
- Added AVX2 implementations of
avx2_qsort
,avx2_qselect
andavx2_partial_qsort
for 32-bit and 64-bit data types. When compared tostd::sort
, these are up to 12x faster for 32-bit data and up to 7x faster for 64-bit data. - x86-simd-sort can now be built and installed as a shared library. The library provides runtime dispatch and automatically picks the fastest version among AVX-512/AVX2/scalar depending on the processor it is run on. Starting with
clearlinux
v40270, you can install x86-simd-sort withswupd bundle-add x86-simd-sort
. - Perf improvements to
avx512_qsort
: 2x speed up for 32-bit data, 1.5x speed up for 64-bit data and 1.25x speed up for 16-bit data. - Perf improvements to
avx512_argsort
andavx512_argselect
intended to mitigate the effect of a vulnerability in gather instruction.
What's Changed
- Changes quicksort and quickselect to use template based sorting networks by @sterrettm2 in #61
- Correct documentation on NaN behavior by @zbjornson in #73
- update CI that builds NumPy by @r-devulap in #79
- Fix MSVC build error by @sterrettm2 in #76
- Use scalar emulation of gather instruction for arg methods by @r-devulap in #65
- Bug fix in avx512_qselect_fp16 by @r-devulap in #80
- Build shared library with runtime ISA dispatch by @r-devulap in #74
- CI: Do not install google benchmarks and use different compiler versions by @r-devulap in #82
- Add option to compare benchmarks across branches by @r-devulap in #84
- Use env var to disable/enable avx512 dispatch by @r-devulap in #85
- Various performance improvements by @sterrettm2 in #83
- Fix NumPy CI failures by @r-devulap in #86
- CI: numpy/core is renamed to numpy/_core by @r-devulap in #87
- Enable argsort and argselect methods on 32-bit platforms by @r-devulap in #88
- Disable prefetch on clang-cl WIN32 by @r-devulap in #89
- Explicitly initialize with zmm_vector<uint64_t> for arg methods on macOS by @r-devulap in #90
- Reorganize code and add examples to build by @r-devulap in #91
- Adds support for AVX2 for 32-bit types for quicksort and quickselect by @sterrettm2 in #60
- Add hasnan = false to all the sort methods by @r-devulap in #92
- Provide a way to install x86simdsort as a library by @r-devulap in #94
- AVX2 64-bit support by @sterrettm2 in #93
- Add soversion to the lib by @r-devulap in #95
New Contributors
- @sterrettm2 made their first contribution in #61
- @zbjornson made their first contribution in #73
Full Changelog: v3.0...4.0.rc