v2.0
Release v2.0 adds a host of new sorting algorithms and supported API's. We also welcome new members to the community who contributed heavily to this release and we are deeply grateful for their efforts. We’d like to acknowledge Mitchell J. O'Sullivan @mosullivan93 and @ruclz /Xi Tang @xtangxtang for adding partial sorting methods and key-value sorting methods.. Here are the main highlights of this release:
New features:
- AVX-512 based
argsort
algorithms using O(1) space for 32-bit and 64-bit data types [1]. It returns the indices that would sort an array. These are up to 6x faster when compared to a scalar solution that usesstd::sort
. This new feature is leveraged by NumPy innp.argsort
[2] in its latest release v1.25. - AVX-512 based quick select algorithm [3] for 16-bit, 32-bit and 64-bit data types. These are equivalent to
std::nthelement
but performs a lot faster. The performance depends on the ratio K/N (where K is the index of the element the array is partitioned around and N is the array size). For smaller values of K, it is up to 6x faster for 64-bit data, up to 15x faster for 32-bit data and up to 7x faster for 16-bit data. The performance gets better as K gets larger (see #13 (comment) for more details). - AVX-512 based partial sort algorithm [3] for 16-bit, 32-bit and 64-bit data types. These are equivalent to
std::partial_sort
. As with quick select, its performance depends on the ratio K/N (where K is the size of partial sorted array and N is the array size) and it tends to perform an order of magnitude faster for larger values. It is about 1.05x faster for tiny partial array sort and up to 20x faster for slightly larger partial arrays. See this #49 (comment) for detailed performance numbers. - AVX-512 based key-value sort [4]. This sorts a pair of key-value arrays and is currently supported only for 64-bit data types. A version of this has been contributed to Oceanbase, an open source distributed relational database. See [8].
- AVX-512 sort for _Float16 data type [5] using AVX-512 FP16 ISA. In NumPy, these are nearly 3x faster than AVX-512 based sort that emulates float16 data type [7] .
Improvements:
- Improvements to AVX-512 quicksort [6] further speeds up sorting 64 and 32-bit data types by up to 1.6x and 1.3x respectively.
New Contributors
- @ruclz made their first contribution in #2
- @bkmgit made their first contribution in #21
- @mosullivan93 made their first contribution in #13
Pull request references:
[1] Add AVX-512 argsort for 32 and 64-bit data type by @r-devulap in #34
[2] numpy/numpy#23707
[3] Implement partial sorting algorithms by @mosullivan93 in #13
[4] add key-value file! by @ruclz in #2
[5] Add qsort for _Float16 using AVX-512 FP16 ISA by @r-devulap in #16
[6] Improve qsort by @r-devulap in #33
[7] numpy/numpy#23435
[8] oceanbase/oceanbase#1325
Full Changelog: v1.0...v2.0