v3.0
Version 3.0 release contains a new supported method avx512_argselect
to compute arg nth_element
(also known as argpartition
in NumPy). It returns an array of indices that would partition the data array. Highlights of this release include:
- v3.0 x86-simd-sort is merged into NumPy main branch. It provides AVX-512 vectorized versions of
np.partition
andnp.argpartition
. It speeds upnp.partition
up by up to 25x for 16-bit, 17x for 32-bit dtypes and about 8x speed up for 64-bit dtypes. Speeds up fornp.argpartition
are up-to 6.5x. - A slightly modified version of x86-simd-sort has now been merged into OpenJDK . It speeds up sorting 32-bit and 64-bit data by up to 15x and 7x respectively.
What's Changed
- Fix typo in README by @r-devulap in #50
- Update workflow for changes in benchmark tooling by @mosullivan93 in #54
- Further Makefile updates by @mosullivan93 in #52
- Add avx512_argselect for 32-bit and 64-bit dtypes by @r-devulap in #56
- Use __builtin_cpu_supports instead of cpuinfo by @r-devulap in #58
- MAINT: Remove template specializations for quicksort methods by @r-devulap in #59
- Add benchmarks for small arrays by @r-devulap in #62
- Improvement to benchmarking scripts by @r-devulap in #66
- Bug fix in benchmark script by @r-devulap in #67
- Use global Macros for GCC specific keywords by @r-devulap in #68
- Fix compiler warnings in src and tests by @r-devulap in #69
- Fix more compiler warnings by @r-devulap in #70
Full Changelog: v2.0...v3.0