8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR #16124

vamsi-parasa · 2023-10-10T16:44:03Z

The goal of this PR is to address the follow-up comments to the SIMD accelerated sort PR (#14227) which implemented AVX512 intrinsics for Arrays.sort() methods.
The proposed changes are:

Restriction of the AVX512 sort acceleration to only Intel CPUs. A performance regression (due to micro-architectural differences) was reported for AMD Zen4 CPUs in the comments section of PR.
Addressing the build failure due to a bug in GCC 12 (which was fixed in version 12.3.1). The details of the bug are at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
Minor changes in Javadoc strings

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

JDK-8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR (Task - P4)

Reviewers

Jatin Bhateja (@jatin-bhateja - Reviewer) ⚠️ Review applies to 1138dc38
Vladimir Kozlov (@vnkozlov - Reviewer)
Magnus Ihse Bursie (@magicus - Reviewer)
Sandhya Viswanathan (@sviswa7 - Reviewer) ⚠️ Review applies to 1de07fb7

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16124/head:pull/16124
$ git checkout pull/16124

Update a local copy of the PR:
$ git checkout pull/16124
$ git pull https://git.openjdk.org/jdk.git pull/16124/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16124

View PR using the GUI difftool:
$ git pr show -t 16124

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16124.diff

Webrev

Link to Webrev Comment

… CPUs

bridgekeeper · 2023-10-10T16:45:11Z

👋 Welcome back vamsi-parasa! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2023-10-10T16:47:08Z

@vamsi-parasa The following labels will be automatically applied to this pull request:

core-libs
hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

mlbridge · 2023-10-10T16:51:45Z

Webrevs

jatin-bhateja

LGTM

openjdk · 2023-10-10T17:36:41Z

⚠️ @vamsi-parasa the full name on your profile does not match the author name in this pull requests' HEAD commit. If this pull request gets integrated then the author name from this pull requests' HEAD commit will be used for the resulting commit. If you wish to push a new commit with a different author name, then please run the following commands in a local repository of your personal fork:

$ git checkout pr_avx512sort
$ git commit --author='Preferred Full Name <[email protected]>' --allow-empty -m 'Update full name'
$ git push

openjdk · 2023-10-10T17:36:42Z

@vamsi-parasa This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR

Reviewed-by: jbhateja, kvn, ihse, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 36 new commits pushed to the master branch:

839cb19: 8317581: [s390x] Multiple test failure with LockingMode=2
387896f: 8309621: [XWayland][Screencast] screen capture failure with sun.java2d.uiScale other than 1
8d2ad2b: 8317977: update problemlist to include MacOS for sun/security/tools/keytool/NssTest.java
b92de54: 8317964: java/awt/Mouse/MouseModifiersUnitTest/MouseModifiersUnitTest_Standard.java fails on macosx-all after JDK-8317751
2a80160: 8314283: Support for NSS tests on aarch64 platforms
3f6d016: 8314896: additional clarifications to reversed() default methods' implementation requirements
2d46b29: 8317874: Add @sealedGraph to StringTemplate.Processor.Linkage
8f8c45b: 8315716: RISC-V: implement ChaCha20 intrinsic
8a9c4d5: 8317675: Serial: Move gc/shared/generation to serial folder
bcafec5: 8316958: Add test for unstructured locking
... and 26 more: https://git.openjdk.org/jdk/compare/f61499c73fe03e2e3680d7f58a84183364c5c5ac...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jatin-bhateja, @vnkozlov, @magicus, @sviswa7) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

vnkozlov

What is change for "Addressing the build failure due to a bug in GCC 12"?

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp

R1chterScale · 2023-10-10T18:40:30Z

Forgive me, I might be missing something very obvious, but is there any particular reason to entirely disable the SIMD accelerated sort on Zen 4 rather than having an alternate code path for Zen 4 where it has the compressstoreu instructions split up into separate compress and storeu instructions so that Zen 4 platforms can still benefit from a decent degree of performance uplift from AVX512 acceleration of sort?

vamsi-parasa · 2023-10-10T18:51:39Z

Hello Vladimir (@vnkozlov) and Erik (@erikj79),

Please see the latest commit which has the #pragma based workaround to address the build failure caused by GCC 12.

Thanks,
Vamsi

magicus · 2023-10-10T19:48:50Z

In #14227, you inadvertently added an extra space at line 230 in make/modules/java.base/Lib.gmk
(https://github.com/openjdk/jdk/pull/14227/files#diff-c2e113e4b2661697750fd5e6dcc0908fa98563ccfb3801c8b0e3a70174041b81).

Can you please revert it here?

vamsi-parasa · 2023-10-10T20:04:48Z

In #14227, you inadvertently added an extra space at line 230 in make/modules/java.base/Lib.gmk

Hi Magnus (@magicus), please see the extra space fixed in the latest commit.

Thanks,
Vamsi

iaroslavski · 2023-10-10T20:14:51Z

Hi @vamsi-parasa ,

Methods partitionDualPivot and partitionSinglePivot are annotated by @ForceInline, but
mixedInsertionSort and insertionSort are not. These all 4 methods are passed as parameter
to methods private static <A> int[] partition(Class<?> elemType, A array, ...
and private static <A> void sort(Class<?> elemType, A array, ....

Is ok that partitionDualPivot, partitionSinglePivot and mixedInsertionSort, insertionSort are annotated differently?

Should we add @ForceInline to mixedInsertionSort, insertionSort or
remove @ForceInline from partitionDualPivot and partitionSinglePivot?

vnkozlov · 2023-10-10T22:05:42Z

Is ok that partitionDualPivot, partitionSinglePivot and mixedInsertionSort, insertionSort are annotated differently?

Good question. Someone familiar with this Java code should answer.

Note, @forceinline annotation is used by C2 JIT compiler when it decide to inline. One criteria is the bytecode size (we don't want inline huge method into another compilation). Default value is 35. With @forceinline the size will be 1 - most likely guarantee inlining.
But there is drawback, if java method is not "hot" (not called frequently) the inlined code may lead to not optimal code generation.
Use it carefully.

vnkozlov · 2023-10-10T22:29:55Z

Also @forceinline in these changes only works for case when new intrinsics are not used.
I would suggest to adapt/update JMH benchmark to cover all cases and see effect @forceinline without intrinsics.
That will tell us which @forceinline annotations are needed.

iaroslavski · 2023-10-11T06:59:47Z

Also @forceinline in these changes only works for case when new intrinsics are not used. I would suggest to adapt/update JMH benchmark to cover all cases and see effect @forceinline without intrinsics. That will tell us which @forceinline annotations are needed.

@vnkozlov Thank you for explanation, I will benchmark cases with and without @forceinline and provide results very soon.

iaroslavski · 2023-10-11T07:03:37Z

@vnkozlov
Are there any options to be sure that C2 JIT compiler is used during tests?

iaroslavski · 2023-10-11T07:10:57Z

Hi @vamsi-parasa,

Both methods mixedInsertionSort and insertionSort are covered by intrinsics.
But insertionSort is run on leftmnost (one) part only and on small ( < MAX_INSERTION_SORT_SIZE = 44) arrays.
Do we actually need to use intrinsics for it?

To have clear picture could you please run benchmarking to compare both cases:
current implementation and implementation with Java insertionSort only?

see changes sort(int.class, a, Unsafe.ARRAY_INT_BASE_OFFSET, low, high, DualPivotQuicksort::insertionSort);
->
insertionSort(a, low, high);

theRealAph · 2023-10-11T09:25:15Z

Forgive me, I might be missing something very obvious, but is there any particular reason to entirely disable the SIMD accelerated sort on Zen 4 rather than having an alternate code path for Zen 4 where it has the compressstoreu instructions split up into separate compress and storeu instructions so that Zen 4 platforms can still benefit from a decent degree of performance uplift from AVX512 acceleration of sort?

I don't think you're missing anything. This should be done, rather than disabling the intrinsic.

magicus

Build changes look good. Thanks for fixing this!

/label +build
/reviewers 2

openjdk · 2023-10-11T11:36:00Z

@magicus
The build label was successfully added.

iaroslavski · 2023-10-11T20:10:12Z

Also @forceinline in these changes only works for case when new intrinsics are not used. I would suggest to adapt/update JMH benchmark to cover all cases and see effect @forceinline without intrinsics. That will tell us which @forceinline annotations are needed.

Added @ForceInline annotations to insertionSort and mixedInsertionSort as it is helping arrays of small sizes when intrinsics are disabled.
Thanks, Vamsi

@vamsi-parasa Please revert changes (adding @forceinline to insertionSort and mixedInsertionSort) - I checked: initinal version works faster.

@iaroslavski Vladimir, have you used -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_arraySort,_arrayPartition to disable the intrinsics? Also, it helps to have one warmup iteration of 30 secs to reduce run-to-run variance.

@vamsi-parasa
Yes, I tried with the disabled mentioned intrinsics.

vamsi-parasa · 2023-10-11T20:31:05Z

To have clear picture could you please run benchmarking to compare both cases: current implementation and implementation with Java insertionSort only?

see changes sort(int.class, a, Unsafe.ARRAY_INT_BASE_OFFSET, low, high, DualPivotQuicksort::insertionSort); -> insertionSort(a, low, high);

@iaroslavski
Yes, the intrinsic is needed.
I remember, the same question was asked in the original PR and data was provided in the original PR.
Even for a small say size=30, the AVX512 sort intrinsic is faster.

make/modules/java.base/Lib.gmk

iaroslavski · 2023-10-11T20:49:00Z

Also @forceinline in these changes only works for case when new intrinsics are not used. I would suggest to adapt/update JMH benchmark to cover all cases and see effect @forceinline without intrinsics. That will tell us which @forceinline annotations are needed.

Added @ForceInline annotations to insertionSort and mixedInsertionSort as it is helping arrays of small sizes when intrinsics are disabled.
Thanks, Vamsi

@vamsi-parasa Please revert changes (adding @forceinline to insertionSort and mixedInsertionSort) - I checked: initinal version works faster.

@iaroslavski Vladimir, have you used -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_arraySort,_arrayPartition to disable the intrinsics? Also, it helps to have one warmup iteration of 30 secs to reduce run-to-run variance.

@vamsi-parasa Yes, I tried with the disabled mentioned intrinsics.

@vamsi-parasa, and no profit with @forceinline on insertionSort and mixedInsertionSort.
I suggest removing @forceinline from insertionSort and mixedInsertionSort.

vamsi-parasa · 2023-10-11T20:53:29Z

@vamsi-parasa, and no profit with @forceinline on insertionSort and mixedInsertionSort. I suggest removing @forceinline from insertionSort and mixedInsertionSort.

@iaroslavski, Please see the change reverted.

sviswa7 · 2023-10-11T21:12:31Z

Okay. I will start testing for current changes. @sviswa7 please file RFE for Zen 4. If we get patch for it we do followup changes in that RFE.

Thanks a lot Vladimir. RFE filed: https://bugs.openjdk.org/browse/JDK-8317976.

vnkozlov · 2023-10-11T22:12:33Z

I started testing for v04 assuming if you make make file changes they will be cosmetic.

iwanowww · 2023-10-11T23:14:26Z

Proposed patch has one disadvantage: there's no way to override ergonomics decisions on AMD CPUs and forcibly enable the intrinsic without rebuilding the JVM.

For many other intrinsics there are flags which enable finer grained control over JVM behavior (e.g., UseVectorizedHashCodeIntrinsic). There were some complaints that there are way to many individual flags to control intrinsics, so now there's -XX:ControlIntrinsic=, but as of now ergonomics doesn't rely on it to control what intrinsics are enabled/disabled.

It makes sense to let -XX:ControlIntrinsic= overrule VM_Version::is_intel() check and enable the intrinsics when AVX512DQ is supported.

sviswa7 · 2023-10-11T23:40:55Z

Proposed patch has one disadvantage: there's no way to override ergonomics decisions on AMD CPUs and forcibly enable the intrinsic without rebuilding the JVM.

For many other intrinsics there are flags which enable finer grained control over JVM behavior (e.g., UseVectorizedHashCodeIntrinsic). There were some complaints that there are way to many individual flags to control intrinsics, so now there's -XX:ControlIntrinsic=, but as of now ergonomics doesn't rely on it to control what intrinsics are enabled/disabled.

It makes sense to let -XX:ControlIntrinsic= overrule VM_Version::is_intel() check and enable the intrinsics when AVX512DQ is supported.

May be it could be done as part of https://bugs.openjdk.org/browse/JDK-8317976.

vnkozlov · 2023-10-12T00:35:29Z

It makes sense to let -XX:ControlIntrinsic= overrule VM_Version::is_intel() check and enable the intrinsics when AVX512DQ is supported.

May be it could be done as part of https://bugs.openjdk.org/browse/JDK-8317976.

It is not easy to do. You need to make sure that you not enable intrinsic by -XX:ControlIntrinsic= when it is not supported by hardware. I would say it is separate (second) RFE because we need to test it separately from current Zen 4 issue.

Note, we are not introducing regression to AMD with these changes - code is reverted back to before previous changes 14227. We have some time to propose patch.

iwanowww · 2023-10-12T02:17:14Z

FTR I'm perfectly fine with handling it separately.

It is not easy to do. You need to make sure that you not enable intrinsic by -XX:ControlIntrinsic=

There's VM_Version::is_intrinsic_supported() introduced specifically for such type of platform-specific checks. vmIntrinsics::is_intrinsic_available() takes into account both ControlIntrinsic/DisableIntrinsic and VM_Version::is_intrinsic_supported().

DanielThomas · 2023-10-12T03:44:28Z

At least on Saphire Rapids the emulation suggested here only imposes a 6% penalty for intSort, while also mitigating the performance issue on Zen 4.

Configuration summary:
* Name:           linux-x86_64-server-release
* Debug level:    release
* HS debug level: product
* JVM variants:   server
* JVM features:   server: 'cds compiler1 compiler2 epsilongc g1gc jfr jni-check jvmci jvmti management parallelgc serialgc services shenandoahgc vm-structs zgc'
* OpenJDK target: OS: linux, CPU architecture: x86, address length: 64
* Version string: 22-internal-adhoc.nfsuper.jdk (22-internal)
* Source date:    1697078366 (2023-10-12T02:39:26Z)

Tools summary:
* Boot JDK:       openjdk version "21" 2023-09-19 OpenJDK Runtime Environment Zulu21.28+86-SA (build 21+35) OpenJDK 64-Bit Server VM Zulu21.28+86-SA (build 21+35, mixed mode, sharing) (at /usr/lib/jvm/zulu-21-amd64)
* Toolchain:      gcc (GNU Compiler Collection)
* C Compiler:     Version 11.4.0 (at /usr/bin/gcc)
* C++ Compiler:   Version 11.4.0 (at /usr/bin/g++)

https://github.com/openjdk/jdk/compare/master...DanielThomas:jdk:dannyt/emulate-compressstoreu?expand=1

Not super familiar with the JDK testing infrastructure I'm afraid, but FWIW I dropped -shorttest from the two @run entries in test/jdk/java/util/Arrays/Sorting.java and it passes:

 /data/jtreg/build/images/jtreg/bin/jtreg -jdk /data/jdk/build/linux-x86_64-server-release/images/jdk test/jdk/java/util/Arrays/Sorting.java
Test results: passed: 1

Intel(R) Xeon(R) Platinum 8488C - Current

Benchmark            (size)  Mode  Cnt     Score     Error  Units
ArraysSort.intSort       10  avgt    3     0.043 ?   0.006  us/op
ArraysSort.intSort       25  avgt    3     0.082 ?   0.002  us/op
ArraysSort.intSort       50  avgt    3     0.205 ?   0.022  us/op
ArraysSort.intSort       75  avgt    3     0.394 ?   0.048  us/op
ArraysSort.intSort      100  avgt    3     0.625 ?   0.003  us/op
ArraysSort.intSort     1000  avgt    3     5.759 ?   1.111  us/op
ArraysSort.intSort    10000  avgt    3    51.680 ?   3.568  us/op
ArraysSort.intSort   100000  avgt    3   777.339 ?  25.809  us/op
ArraysSort.intSort  1000000  avgt    3  8848.261 ? 954.475  us/op

Intel(R) Xeon(R) Platinum 8488C - Emulated

Benchmark            (size)  Mode  Cnt     Score     Error  Units
ArraysSort.intSort       10  avgt    3     0.046 ?   0.002  us/op
ArraysSort.intSort       25  avgt    3     0.083 ?   0.004  us/op
ArraysSort.intSort       50  avgt    3     0.214 ?   0.022  us/op
ArraysSort.intSort       75  avgt    3     0.411 ?   0.038  us/op
ArraysSort.intSort      100  avgt    3     0.658 ?   0.022  us/op
ArraysSort.intSort     1000  avgt    3     6.411 ?   0.497  us/op
ArraysSort.intSort    10000  avgt    3    55.996 ?   3.155  us/op
ArraysSort.intSort   100000  avgt    3   822.805 ?  40.223  us/op
ArraysSort.intSort  1000000  avgt    3  9487.974 ? 216.146  us/op

Intel(R) Xeon(R) Platinum 8488C - Baseline

Benchmark            (size)  Mode  Cnt      Score      Error  Units
ArraysSort.intSort       10  avgt    3      0.047 ?    0.006  us/op
ArraysSort.intSort       25  avgt    3      0.099 ?    0.022  us/op
ArraysSort.intSort       50  avgt    3      0.249 ?    0.024  us/op
ArraysSort.intSort       75  avgt    3      0.438 ?    0.046  us/op
ArraysSort.intSort      100  avgt    3      0.590 ?    0.079  us/op
ArraysSort.intSort     1000  avgt    3      8.384 ?    1.852  us/op
ArraysSort.intSort    10000  avgt    3    435.589 ?   23.647  us/op
ArraysSort.intSort   100000  avgt    3   5380.658 ?  491.435  us/op
ArraysSort.intSort  1000000  avgt    3  63857.189 ? 2746.106  us/op

AMD EPYC 9R14 - Emulated

$ make test TEST="micro:java.lang.ArraysSort.intSort"

Benchmark            (size)  Mode  Cnt     Score     Error  Units
ArraysSort.intSort       10  avgt    3     0.032 ?   0.001  us/op
ArraysSort.intSort       25  avgt    3     0.067 ?   0.002  us/op
ArraysSort.intSort       50  avgt    3     0.196 ?   0.002  us/op
ArraysSort.intSort       75  avgt    3     0.429 ?   0.046  us/op
ArraysSort.intSort      100  avgt    3     0.614 ?   0.025  us/op
ArraysSort.intSort     1000  avgt    3     6.500 ?   0.084  us/op
ArraysSort.intSort    10000  avgt    3    55.620 ?   0.943  us/op
ArraysSort.intSort   100000  avgt    3   669.347 ?  75.432  us/op
ArraysSort.intSort  1000000  avgt    3  9459.001 ? 201.298  us/op

AMD EPYC 9R14 - Baseline

$ make test TEST="micro:java.lang.ArraysSort.intSort" MICRO="VM_OPTIONS=-XX:UseAVX=2"

Benchmark            (size)  Mode  Cnt      Score      Error  Units
ArraysSort.intSort       10  avgt    3      0.035 ?    0.016  us/op
ArraysSort.intSort       25  avgt    3      0.091 ?    0.009  us/op
ArraysSort.intSort       50  avgt    3      0.245 ?    0.002  us/op
ArraysSort.intSort       75  avgt    3      0.412 ?    0.004  us/op
ArraysSort.intSort      100  avgt    3      0.531 ?    0.003  us/op
ArraysSort.intSort     1000  avgt    3      8.803 ?    0.609  us/op
ArraysSort.intSort    10000  avgt    3    254.413 ?  153.004  us/op
ArraysSort.intSort   100000  avgt    3   4485.811 ?   17.517  us/op
ArraysSort.intSort  1000000  avgt    3  56552.132 ? 3124.280  us/op

vnkozlov

My tier1-3,xcomp testing for v04 passed. I am integrating these changes.
Lets continue discussion about changes for AMD in https://bugs.openjdk.org/browse/JDK-8317976.

vamsi-parasa · 2023-10-12T04:42:29Z

My tier1-3,xcomp testing for v04 passed. I am integrating these changes. Lets continue discussion about changes for AMD in https://bugs.openjdk.org/browse/JDK-8317976.

Thank you, Vladimir!

vamsi-parasa · 2023-10-12T04:42:54Z

/integrate

openjdk · 2023-10-12T04:43:38Z

@vamsi-parasa
Your change (at version 9bce680) is now ready to be sponsored by a Committer.

vnkozlov · 2023-10-12T04:44:45Z

/sponsor

openjdk · 2023-10-12T04:45:26Z

Going to push as commit 2edf9c3.
Since your change was applied there have been 36 commits pushed to the master branch:

839cb19: 8317581: [s390x] Multiple test failure with LockingMode=2
387896f: 8309621: [XWayland][Screencast] screen capture failure with sun.java2d.uiScale other than 1
8d2ad2b: 8317977: update problemlist to include MacOS for sun/security/tools/keytool/NssTest.java
b92de54: 8317964: java/awt/Mouse/MouseModifiersUnitTest/MouseModifiersUnitTest_Standard.java fails on macosx-all after JDK-8317751
2a80160: 8314283: Support for NSS tests on aarch64 platforms
3f6d016: 8314896: additional clarifications to reversed() default methods' implementation requirements
2d46b29: 8317874: Add @sealedGraph to StringTemplate.Processor.Linkage
8f8c45b: 8315716: RISC-V: implement ChaCha20 intrinsic
8a9c4d5: 8317675: Serial: Move gc/shared/generation to serial folder
bcafec5: 8316958: Add test for unstructured locking
... and 26 more: https://git.openjdk.org/jdk/compare/f61499c73fe03e2e3680d7f58a84183364c5c5ac...master

Your commit was automatically rebased without conflicts.

openjdk · 2023-10-12T04:45:37Z

@vnkozlov @vamsi-parasa Pushed as commit 2edf9c3.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

PaulSandoz · 2023-10-12T16:50:11Z

At least on Saphire Rapids the emulation suggested here only imposes a 6% penalty for intSort, while also mitigating the performance issue on Zen 4.

Thanks for testing this out, this seems a good solution to explore.

Is there is a way to ifdef switching on the CPU vendor? otherwise we could probably set a flag in the build script based of the CPU vendor, which may be more robust if different compilers are used.

tarsa · 2024-11-23T13:41:25Z

The answer for slow performance of AVX512 version of x86-simd-sort on Zen 4 is most probably explained in AMD manuals which could be found at: https://www.amd.com/en/search/documentation/hub.html#q=software%20optimization%20guide%20for%20the%20amd%20microarchitecture&f-amd_document_type=Software%20Optimization%20Guides

Software Optimization Guide for the AMD Zen4 Microarchitecture has following remark in "2.11.2 Code recommendations" chapter:

Avoid the memory destination form of COMPRESS instructions. These forms are implemented using microcode and achieve a lower store bandwidth than their register destination forms which use fastpath macro ops.

Software Optimization Guide for the AMD Zen5 Microarchitecture doesn't have any remark about COMPRESS instructions.

Could you add some code that disables the AVX512 version on Zen4, but keeps it enabled on Zen5 and future Zen architectures?

8317763:Restrict AVX512 intrinsics for Arrays.sort() methods to Intel…

1138dc3

… CPUs

openjdk bot added the rfr Pull request is ready for review label Oct 10, 2023

openjdk bot added hotspot-compiler [email protected] core-libs [email protected] labels Oct 10, 2023

vamsi-parasa changed the title ~~8317763: Restrict AVX512 intrinsics for Arrays.sort() methods to Intel CPUs~~ 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() methods to Intel CPUs Oct 10, 2023

vamsi-parasa changed the title ~~8317763: Follow-up to AVX512 intrinsics for Arrays.sort() methods to Intel CPUs~~ 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR Oct 10, 2023

jatin-bhateja approved these changes Oct 10, 2023

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Oct 10, 2023

vnkozlov reviewed Oct 10, 2023

View reviewed changes

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp Outdated Show resolved Hide resolved

pragma workround for GCC12 bug

386bdd8

fix whitespace in build script

a565f1d

magicus approved these changes Oct 11, 2023

View reviewed changes

openjdk bot added the build [email protected] label Oct 11, 2023

erikj79 reviewed Oct 11, 2023

View reviewed changes

make/modules/java.base/Lib.gmk Show resolved Hide resolved

Revert @ForceInline annotations for small array sort methods

9bce680

openjdk bot removed the sponsor Pull request is ready to be sponsored label Oct 11, 2023

vnkozlov approved these changes Oct 12, 2023

View reviewed changes

openjdk bot added the sponsor Pull request is ready to be sponsored label Oct 12, 2023

openjdk bot added the integrated Pull request has been integrated label Oct 12, 2023

openjdk bot closed this Oct 12, 2023

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Oct 12, 2023

tarsa mentioned this pull request Nov 28, 2024

performance on amd 7950x ... intel/x86-simd-sort#6

Open

8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR #16124

8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR #16124

Conversation

vamsi-parasa commented Oct 10, 2023 • edited by openjdk bot Loading

Progress

Issue

Reviewers

Reviewing

Webrev

bridgekeeper bot commented Oct 10, 2023

openjdk bot commented Oct 10, 2023

mlbridge bot commented Oct 10, 2023 • edited Loading

Webrevs

jatin-bhateja left a comment

Choose a reason for hiding this comment

openjdk bot commented Oct 10, 2023

openjdk bot commented Oct 10, 2023 • edited Loading

vnkozlov left a comment

Choose a reason for hiding this comment

R1chterScale commented Oct 10, 2023 • edited by bridgekeeper bot Loading

vamsi-parasa commented Oct 10, 2023

magicus commented Oct 10, 2023

vamsi-parasa commented Oct 10, 2023 • edited Loading

iaroslavski commented Oct 10, 2023

vnkozlov commented Oct 10, 2023

vnkozlov commented Oct 10, 2023

iaroslavski commented Oct 11, 2023

iaroslavski commented Oct 11, 2023

iaroslavski commented Oct 11, 2023

theRealAph commented Oct 11, 2023

magicus left a comment

Choose a reason for hiding this comment

openjdk bot commented Oct 11, 2023

iaroslavski commented Oct 11, 2023

vamsi-parasa commented Oct 11, 2023 • edited Loading

iaroslavski commented Oct 11, 2023

vamsi-parasa commented Oct 11, 2023

sviswa7 commented Oct 11, 2023

vnkozlov commented Oct 11, 2023

iwanowww commented Oct 11, 2023

sviswa7 commented Oct 11, 2023

vnkozlov commented Oct 12, 2023

iwanowww commented Oct 12, 2023

DanielThomas commented Oct 12, 2023 • edited Loading

Intel(R) Xeon(R) Platinum 8488C - Current

Intel(R) Xeon(R) Platinum 8488C - Emulated

Intel(R) Xeon(R) Platinum 8488C - Baseline

AMD EPYC 9R14 - Emulated

AMD EPYC 9R14 - Baseline

vnkozlov left a comment

Choose a reason for hiding this comment

vamsi-parasa commented Oct 12, 2023

vamsi-parasa commented Oct 12, 2023

openjdk bot commented Oct 12, 2023

vnkozlov commented Oct 12, 2023

openjdk bot commented Oct 12, 2023

openjdk bot commented Oct 12, 2023

PaulSandoz commented Oct 12, 2023

tarsa commented Nov 23, 2024

vamsi-parasa commented Oct 10, 2023 •

edited by openjdk bot

Loading

mlbridge bot commented Oct 10, 2023 •

edited

Loading

openjdk bot commented Oct 10, 2023 •

edited

Loading

R1chterScale commented Oct 10, 2023 •

edited by bridgekeeper bot

Loading

vamsi-parasa commented Oct 10, 2023 •

edited

Loading

vamsi-parasa commented Oct 11, 2023 •

edited

Loading

DanielThomas commented Oct 12, 2023 •

edited

Loading