Release v3.0 · intel/x86-simd-sort

Version 3.0 release contains a new supported method avx512_argselect to compute arg nth_element (also known as argpartition in NumPy). It returns an array of indices that would partition the data array. Highlights of this release include:

v3.0 x86-simd-sort is merged into NumPy main branch. It provides AVX-512 vectorized versions of np.partition and np.argpartition . It speeds up np.partition up by up to 25x for 16-bit, 17x for 32-bit dtypes and about 8x speed up for 64-bit dtypes. Speeds up for np.argpartition are up-to 6.5x.
A slightly modified version of x86-simd-sort has now been merged into OpenJDK . It speeds up sorting 32-bit and 64-bit data by up to 15x and 7x respectively.

What's Changed

Fix typo in README by @r-devulap in #50
Update workflow for changes in benchmark tooling by @mosullivan93 in #54
Further Makefile updates by @mosullivan93 in #52
Add avx512_argselect for 32-bit and 64-bit dtypes by @r-devulap in #56
Use __builtin_cpu_supports instead of cpuinfo by @r-devulap in #58
MAINT: Remove template specializations for quicksort methods by @r-devulap in #59
Add benchmarks for small arrays by @r-devulap in #62
Improvement to benchmarking scripts by @r-devulap in #66
Bug fix in benchmark script by @r-devulap in #67
Use global Macros for GCC specific keywords by @r-devulap in #68
Fix compiler warnings in src and tests by @r-devulap in #69
Fix more compiler warnings by @r-devulap in #70

Full Changelog: v2.0...v3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.0

What's Changed

Contributors