Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for AVX2 for 32-bit types for quicksort and quickselect #60

Merged
merged 19 commits into from
Oct 24, 2023

Conversation

sterrettm2
Copy link
Contributor

@sterrettm2 sterrettm2 commented Aug 7, 2023

This patch adds support for AVX2 for 32-bit data types (int32_t, uint32_t, float) for quicksort and quickselect. It also includes benchmarks and tests for these. Below is a comparison of the performance:

Benchmark                                                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------------------------------------------------------
[scalarsort.*random vs. simdsort.*random]_128/uint32_t                 -0.3175         -0.3167          1401           956          1403           959
[scalarsort.*random vs. simdsort.*random]_256/uint32_t                 -0.4046         -0.4039          2044          1217          2045          1219
[scalarsort.*random vs. simdsort.*random]_512/uint32_t                 -0.4359         -0.4336          3625          2045          3626          2054
[scalarsort.*random vs. simdsort.*random]_1k/uint32_t                  -0.7731         -0.7725         15457          3507         15466          3518
[scalarsort.*random vs. simdsort.*random]_5k/uint32_t                  -0.9131         -0.9130        212809         18499        212806         18508
[scalarsort.*random vs. simdsort.*random]_100k/uint32_t                -0.9056         -0.9056       5718828        539986       5718520        539984
[scalarsort.*random vs. simdsort.*random]_1m/uint32_t                  -0.8999         -0.8999      68343297       6840011      68339358       6838741
[scalarsort.*random vs. simdsort.*random]_10m/uint32_t                 -0.8948         -0.8948     793869955      83553573     793781075      83543720
[scalarsort.*random vs. simdsort.*random]_128/int32_t                  -0.3357         -0.3341          1440           957          1442           960
[scalarsort.*random vs. simdsort.*random]_256/int32_t                  -0.4313         -0.4305          2153          1224          2154          1227
[scalarsort.*random vs. simdsort.*random]_512/int32_t                  -0.4535         -0.4524          3793          2073          3794          2078
[scalarsort.*random vs. simdsort.*random]_1k/int32_t                   -0.7714         -0.7712         15606          3568         15623          3575
[scalarsort.*random vs. simdsort.*random]_5k/int32_t                   -0.9138         -0.9138        219140         18888        219138         18897
[scalarsort.*random vs. simdsort.*random]_100k/int32_t                 -0.9063         -0.9063       5858561        549050       5858314        549068
[scalarsort.*random vs. simdsort.*random]_1m/int32_t                   -0.9003         -0.9003      69789096       6959065      69783875       6958046
[scalarsort.*random vs. simdsort.*random]_10m/int32_t                  -0.8949         -0.8949     808208079      84946240     808130032      84933987
[scalarsort.*random vs. simdsort.*random]_128/float                    -0.3752         -0.3741          1565           978          1568           981
[scalarsort.*random vs. simdsort.*random]_256/float                    -0.5006         -0.4985          2502          1250          2503          1255
[scalarsort.*random vs. simdsort.*random]_512/float                    -0.4848         -0.4841          4509          2323          4511          2327
[scalarsort.*random vs. simdsort.*random]_1k/float                     -0.7973         -0.7970         19684          3990         19698          3999
[scalarsort.*random vs. simdsort.*random]_5k/float                     -0.9247         -0.9247        248234         18689        248230         18700
[scalarsort.*random vs. simdsort.*random]_100k/float                   -0.9131         -0.9131       6895883        599356       6895408        599287
[scalarsort.*random vs. simdsort.*random]_1m/float                     -0.9064         -0.9064      81984199       7676841      81976275       7675610
[scalarsort.*random vs. simdsort.*random]_10m/float                    -0.9005         -0.9005     956696234      95182091     956582348      95170689

Copy link
Contributor

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed only changes in the common qsort file.

src/xss-common-qsort.h Outdated Show resolved Hide resolved
src/xss-common-qsort.h Outdated Show resolved Hide resolved
src/xss-common-qsort.h Outdated Show resolved Hide resolved
src/xss-common-qsort.h Outdated Show resolved Hide resolved
src/xss-common-qsort.h Outdated Show resolved Hide resolved
src/xss-common-qsort.h Outdated Show resolved Hide resolved
Copy link
Contributor

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @sterrettm2!

@r-devulap r-devulap merged commit 64908e7 into intel:main Oct 24, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants