Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR #16124

Closed
wants to merge 5 commits into from

Conversation

vamsi-parasa
Copy link
Contributor

@vamsi-parasa vamsi-parasa commented Oct 10, 2023

The goal of this PR is to address the follow-up comments to the SIMD accelerated sort PR (#14227) which implemented AVX512 intrinsics for Arrays.sort() methods.
The proposed changes are:

  1. Restriction of the AVX512 sort acceleration to only Intel CPUs. A performance regression (due to micro-architectural differences) was reported for AMD Zen4 CPUs in the comments section of PR.
  2. Addressing the build failure due to a bug in GCC 12 (which was fixed in version 12.3.1). The details of the bug are at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
  3. Minor changes in Javadoc strings

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR (Task - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16124/head:pull/16124
$ git checkout pull/16124

Update a local copy of the PR:
$ git checkout pull/16124
$ git pull https://git.openjdk.org/jdk.git pull/16124/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16124

View PR using the GUI difftool:
$ git pr show -t 16124

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16124.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 10, 2023

👋 Welcome back vamsi-parasa! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 10, 2023
@openjdk
Copy link

openjdk bot commented Oct 10, 2023

@vamsi-parasa The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Oct 10, 2023

Webrevs

@vamsi-parasa vamsi-parasa changed the title 8317763: Restrict AVX512 intrinsics for Arrays.sort() methods to Intel CPUs 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() methods to Intel CPUs Oct 10, 2023
@vamsi-parasa vamsi-parasa changed the title 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() methods to Intel CPUs 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR Oct 10, 2023
Copy link
Member

@jatin-bhateja jatin-bhateja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openjdk
Copy link

openjdk bot commented Oct 10, 2023

⚠️ @vamsi-parasa the full name on your profile does not match the author name in this pull requests' HEAD commit. If this pull request gets integrated then the author name from this pull requests' HEAD commit will be used for the resulting commit. If you wish to push a new commit with a different author name, then please run the following commands in a local repository of your personal fork:

$ git checkout pr_avx512sort
$ git commit --author='Preferred Full Name <[email protected]>' --allow-empty -m 'Update full name'
$ git push

@openjdk
Copy link

openjdk bot commented Oct 10, 2023

@vamsi-parasa This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR

Reviewed-by: jbhateja, kvn, ihse, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 36 new commits pushed to the master branch:

  • 839cb19: 8317581: [s390x] Multiple test failure with LockingMode=2
  • 387896f: 8309621: [XWayland][Screencast] screen capture failure with sun.java2d.uiScale other than 1
  • 8d2ad2b: 8317977: update problemlist to include MacOS for sun/security/tools/keytool/NssTest.java
  • b92de54: 8317964: java/awt/Mouse/MouseModifiersUnitTest/MouseModifiersUnitTest_Standard.java fails on macosx-all after JDK-8317751
  • 2a80160: 8314283: Support for NSS tests on aarch64 platforms
  • 3f6d016: 8314896: additional clarifications to reversed() default methods' implementation requirements
  • 2d46b29: 8317874: Add @sealedGraph to StringTemplate.Processor.Linkage
  • 8f8c45b: 8315716: RISC-V: implement ChaCha20 intrinsic
  • 8a9c4d5: 8317675: Serial: Move gc/shared/generation to serial folder
  • bcafec5: 8316958: Add test for unstructured locking
  • ... and 26 more: https://git.openjdk.org/jdk/compare/f61499c73fe03e2e3680d7f58a84183364c5c5ac...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jatin-bhateja, @vnkozlov, @magicus, @sviswa7) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 10, 2023
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is change for "Addressing the build failure due to a bug in GCC 12"?

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp Outdated Show resolved Hide resolved
@R1chterScale
Copy link

R1chterScale commented Oct 10, 2023

Forgive me, I might be missing something very obvious, but is there any particular reason to entirely disable the SIMD accelerated sort on Zen 4 rather than having an alternate code path for Zen 4 where it has the compressstoreu instructions split up into separate compress and storeu instructions so that Zen 4 platforms can still benefit from a decent degree of performance uplift from AVX512 acceleration of sort?

@vamsi-parasa
Copy link
Contributor Author

Hello Vladimir (@vnkozlov) and Erik (@erikj79),

Please see the latest commit which has the #pragma based workaround to address the build failure caused by GCC 12.

Thanks,
Vamsi

@magicus
Copy link
Member

magicus commented Oct 10, 2023

In #14227, you inadvertently added an extra space at line 230 in make/modules/java.base/Lib.gmk
(https://github.com/openjdk/jdk/pull/14227/files#diff-c2e113e4b2661697750fd5e6dcc0908fa98563ccfb3801c8b0e3a70174041b81).

Can you please revert it here?

@vamsi-parasa
Copy link
Contributor Author

vamsi-parasa commented Oct 10, 2023

In #14227, you inadvertently added an extra space at line 230 in make/modules/java.base/Lib.gmk

Hi Magnus (@magicus), please see the extra space fixed in the latest commit.

Thanks,
Vamsi

@iaroslavski
Copy link
Contributor

Hi @vamsi-parasa ,

Methods partitionDualPivot and partitionSinglePivot are annotated by @ForceInline, but
mixedInsertionSort and insertionSort are not. These all 4 methods are passed as parameter
to methods private static <A> int[] partition(Class<?> elemType, A array, ...
and private static <A> void sort(Class<?> elemType, A array, ....

Is ok that partitionDualPivot, partitionSinglePivot and mixedInsertionSort, insertionSort are annotated differently?

Should we add @ForceInline to mixedInsertionSort, insertionSort or
remove @ForceInline from partitionDualPivot and partitionSinglePivot?

@vnkozlov
Copy link
Contributor

Is ok that partitionDualPivot, partitionSinglePivot and mixedInsertionSort, insertionSort are annotated differently?

Good question. Someone familiar with this Java code should answer.

Note, @forceinline annotation is used by C2 JIT compiler when it decide to inline. One criteria is the bytecode size (we don't want inline huge method into another compilation). Default value is 35. With @forceinline the size will be 1 - most likely guarantee inlining.
But there is drawback, if java method is not "hot" (not called frequently) the inlined code may lead to not optimal code generation.
Use it carefully.

@vnkozlov
Copy link
Contributor

Also @forceinline in these changes only works for case when new intrinsics are not used.
I would suggest to adapt/update JMH benchmark to cover all cases and see effect @forceinline without intrinsics.
That will tell us which @forceinline annotations are needed.

@iaroslavski
Copy link
Contributor

Also @forceinline in these changes only works for case when new intrinsics are not used. I would suggest to adapt/update JMH benchmark to cover all cases and see effect @forceinline without intrinsics. That will tell us which @forceinline annotations are needed.

@vnkozlov Thank you for explanation, I will benchmark cases with and without @forceinline and provide results very soon.

@iaroslavski
Copy link
Contributor

@vnkozlov
Are there any options to be sure that C2 JIT compiler is used during tests?

@iaroslavski
Copy link
Contributor

Hi @vamsi-parasa,

Both methods mixedInsertionSort and insertionSort are covered by intrinsics.
But insertionSort is run on leftmnost (one) part only and on small ( < MAX_INSERTION_SORT_SIZE = 44) arrays.
Do we actually need to use intrinsics for it?

To have clear picture could you please run benchmarking to compare both cases:
current implementation and implementation with Java insertionSort only?

see changes sort(int.class, a, Unsafe.ARRAY_INT_BASE_OFFSET, low, high, DualPivotQuicksort::insertionSort);
->
insertionSort(a, low, high);

@theRealAph
Copy link
Contributor

Forgive me, I might be missing something very obvious, but is there any particular reason to entirely disable the SIMD accelerated sort on Zen 4 rather than having an alternate code path for Zen 4 where it has the compressstoreu instructions split up into separate compress and storeu instructions so that Zen 4 platforms can still benefit from a decent degree of performance uplift from AVX512 acceleration of sort?

I don't think you're missing anything. This should be done, rather than disabling the intrinsic.

Copy link
Member

@magicus magicus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build changes look good. Thanks for fixing this!

/label +build
/reviewers 2

@openjdk
Copy link

openjdk bot commented Oct 11, 2023

@magicus
The build label was successfully added.

@iaroslavski
Copy link
Contributor

Also @forceinline in these changes only works for case when new intrinsics are not used. I would suggest to adapt/update JMH benchmark to cover all cases and see effect @forceinline without intrinsics. That will tell us which @forceinline annotations are needed.

Added @ForceInline annotations to insertionSort and mixedInsertionSort as it is helping arrays of small sizes when intrinsics are disabled.
Thanks, Vamsi

@vamsi-parasa Please revert changes (adding @forceinline to insertionSort and mixedInsertionSort) - I checked: initinal version works faster.

@iaroslavski Vladimir, have you used -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_arraySort,_arrayPartition to disable the intrinsics? Also, it helps to have one warmup iteration of 30 secs to reduce run-to-run variance.

@vamsi-parasa
Yes, I tried with the disabled mentioned intrinsics.

@vamsi-parasa
Copy link
Contributor Author

vamsi-parasa commented Oct 11, 2023

To have clear picture could you please run benchmarking to compare both cases: current implementation and implementation with Java insertionSort only?

see changes sort(int.class, a, Unsafe.ARRAY_INT_BASE_OFFSET, low, high, DualPivotQuicksort::insertionSort); -> insertionSort(a, low, high);

@iaroslavski
Yes, the intrinsic is needed.
I remember, the same question was asked in the original PR and data was provided in the original PR.
Even for a small say size=30, the AVX512 sort intrinsic is faster.

@iaroslavski
Copy link
Contributor

Also @forceinline in these changes only works for case when new intrinsics are not used. I would suggest to adapt/update JMH benchmark to cover all cases and see effect @forceinline without intrinsics. That will tell us which @forceinline annotations are needed.

Added @ForceInline annotations to insertionSort and mixedInsertionSort as it is helping arrays of small sizes when intrinsics are disabled.
Thanks, Vamsi

@vamsi-parasa Please revert changes (adding @forceinline to insertionSort and mixedInsertionSort) - I checked: initinal version works faster.

@iaroslavski Vladimir, have you used -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_arraySort,_arrayPartition to disable the intrinsics? Also, it helps to have one warmup iteration of 30 secs to reduce run-to-run variance.

@vamsi-parasa Yes, I tried with the disabled mentioned intrinsics.

@vamsi-parasa, and no profit with @forceinline on insertionSort and mixedInsertionSort.
I suggest removing @forceinline from insertionSort and mixedInsertionSort.

@openjdk openjdk bot removed the sponsor Pull request is ready to be sponsored label Oct 11, 2023
@vamsi-parasa
Copy link
Contributor Author

@vamsi-parasa, and no profit with @forceinline on insertionSort and mixedInsertionSort. I suggest removing @forceinline from insertionSort and mixedInsertionSort.

@iaroslavski, Please see the change reverted.

@sviswa7
Copy link

sviswa7 commented Oct 11, 2023

Okay. I will start testing for current changes. @sviswa7 please file RFE for Zen 4. If we get patch for it we do followup changes in that RFE.

Thanks a lot Vladimir. RFE filed: https://bugs.openjdk.org/browse/JDK-8317976.

@vnkozlov
Copy link
Contributor

I started testing for v04 assuming if you make make file changes they will be cosmetic.

@iwanowww
Copy link
Contributor

Proposed patch has one disadvantage: there's no way to override ergonomics decisions on AMD CPUs and forcibly enable the intrinsic without rebuilding the JVM.

For many other intrinsics there are flags which enable finer grained control over JVM behavior (e.g., UseVectorizedHashCodeIntrinsic). There were some complaints that there are way to many individual flags to control intrinsics, so now there's -XX:ControlIntrinsic=, but as of now ergonomics doesn't rely on it to control what intrinsics are enabled/disabled.

It makes sense to let -XX:ControlIntrinsic= overrule VM_Version::is_intel() check and enable the intrinsics when AVX512DQ is supported.

@sviswa7
Copy link

sviswa7 commented Oct 11, 2023

Proposed patch has one disadvantage: there's no way to override ergonomics decisions on AMD CPUs and forcibly enable the intrinsic without rebuilding the JVM.

For many other intrinsics there are flags which enable finer grained control over JVM behavior (e.g., UseVectorizedHashCodeIntrinsic). There were some complaints that there are way to many individual flags to control intrinsics, so now there's -XX:ControlIntrinsic=, but as of now ergonomics doesn't rely on it to control what intrinsics are enabled/disabled.

It makes sense to let -XX:ControlIntrinsic= overrule VM_Version::is_intel() check and enable the intrinsics when AVX512DQ is supported.

May be it could be done as part of https://bugs.openjdk.org/browse/JDK-8317976.

@vnkozlov
Copy link
Contributor

It makes sense to let -XX:ControlIntrinsic= overrule VM_Version::is_intel() check and enable the intrinsics when AVX512DQ is supported.

May be it could be done as part of https://bugs.openjdk.org/browse/JDK-8317976.

It is not easy to do. You need to make sure that you not enable intrinsic by -XX:ControlIntrinsic= when it is not supported by hardware. I would say it is separate (second) RFE because we need to test it separately from current Zen 4 issue.

Note, we are not introducing regression to AMD with these changes - code is reverted back to before previous changes 14227. We have some time to propose patch.

@iwanowww
Copy link
Contributor

FTR I'm perfectly fine with handling it separately.

It is not easy to do. You need to make sure that you not enable intrinsic by -XX:ControlIntrinsic=

There's VM_Version::is_intrinsic_supported() introduced specifically for such type of platform-specific checks. vmIntrinsics::is_intrinsic_available() takes into account both ControlIntrinsic/DisableIntrinsic and VM_Version::is_intrinsic_supported().

@DanielThomas
Copy link

DanielThomas commented Oct 12, 2023

At least on Saphire Rapids the emulation suggested here only imposes a 6% penalty for intSort, while also mitigating the performance issue on Zen 4.

Configuration summary:
* Name:           linux-x86_64-server-release
* Debug level:    release
* HS debug level: product
* JVM variants:   server
* JVM features:   server: 'cds compiler1 compiler2 epsilongc g1gc jfr jni-check jvmci jvmti management parallelgc serialgc services shenandoahgc vm-structs zgc'
* OpenJDK target: OS: linux, CPU architecture: x86, address length: 64
* Version string: 22-internal-adhoc.nfsuper.jdk (22-internal)
* Source date:    1697078366 (2023-10-12T02:39:26Z)

Tools summary:
* Boot JDK:       openjdk version "21" 2023-09-19 OpenJDK Runtime Environment Zulu21.28+86-SA (build 21+35) OpenJDK 64-Bit Server VM Zulu21.28+86-SA (build 21+35, mixed mode, sharing) (at /usr/lib/jvm/zulu-21-amd64)
* Toolchain:      gcc (GNU Compiler Collection)
* C Compiler:     Version 11.4.0 (at /usr/bin/gcc)
* C++ Compiler:   Version 11.4.0 (at /usr/bin/g++)

https://github.com/openjdk/jdk/compare/master...DanielThomas:jdk:dannyt/emulate-compressstoreu?expand=1

Not super familiar with the JDK testing infrastructure I'm afraid, but FWIW I dropped -shorttest from the two @run entries in test/jdk/java/util/Arrays/Sorting.java and it passes:

 /data/jtreg/build/images/jtreg/bin/jtreg -jdk /data/jdk/build/linux-x86_64-server-release/images/jdk test/jdk/java/util/Arrays/Sorting.java
Test results: passed: 1

Intel(R) Xeon(R) Platinum 8488C - Current

Benchmark            (size)  Mode  Cnt     Score     Error  Units
ArraysSort.intSort       10  avgt    3     0.043 ?   0.006  us/op
ArraysSort.intSort       25  avgt    3     0.082 ?   0.002  us/op
ArraysSort.intSort       50  avgt    3     0.205 ?   0.022  us/op
ArraysSort.intSort       75  avgt    3     0.394 ?   0.048  us/op
ArraysSort.intSort      100  avgt    3     0.625 ?   0.003  us/op
ArraysSort.intSort     1000  avgt    3     5.759 ?   1.111  us/op
ArraysSort.intSort    10000  avgt    3    51.680 ?   3.568  us/op
ArraysSort.intSort   100000  avgt    3   777.339 ?  25.809  us/op
ArraysSort.intSort  1000000  avgt    3  8848.261 ? 954.475  us/op

Intel(R) Xeon(R) Platinum 8488C - Emulated

Benchmark            (size)  Mode  Cnt     Score     Error  Units
ArraysSort.intSort       10  avgt    3     0.046 ?   0.002  us/op
ArraysSort.intSort       25  avgt    3     0.083 ?   0.004  us/op
ArraysSort.intSort       50  avgt    3     0.214 ?   0.022  us/op
ArraysSort.intSort       75  avgt    3     0.411 ?   0.038  us/op
ArraysSort.intSort      100  avgt    3     0.658 ?   0.022  us/op
ArraysSort.intSort     1000  avgt    3     6.411 ?   0.497  us/op
ArraysSort.intSort    10000  avgt    3    55.996 ?   3.155  us/op
ArraysSort.intSort   100000  avgt    3   822.805 ?  40.223  us/op
ArraysSort.intSort  1000000  avgt    3  9487.974 ? 216.146  us/op

Intel(R) Xeon(R) Platinum 8488C - Baseline

Benchmark            (size)  Mode  Cnt      Score      Error  Units
ArraysSort.intSort       10  avgt    3      0.047 ?    0.006  us/op
ArraysSort.intSort       25  avgt    3      0.099 ?    0.022  us/op
ArraysSort.intSort       50  avgt    3      0.249 ?    0.024  us/op
ArraysSort.intSort       75  avgt    3      0.438 ?    0.046  us/op
ArraysSort.intSort      100  avgt    3      0.590 ?    0.079  us/op
ArraysSort.intSort     1000  avgt    3      8.384 ?    1.852  us/op
ArraysSort.intSort    10000  avgt    3    435.589 ?   23.647  us/op
ArraysSort.intSort   100000  avgt    3   5380.658 ?  491.435  us/op
ArraysSort.intSort  1000000  avgt    3  63857.189 ? 2746.106  us/op

AMD EPYC 9R14 - Emulated

$ make test TEST="micro:java.lang.ArraysSort.intSort"

Benchmark            (size)  Mode  Cnt     Score     Error  Units
ArraysSort.intSort       10  avgt    3     0.032 ?   0.001  us/op
ArraysSort.intSort       25  avgt    3     0.067 ?   0.002  us/op
ArraysSort.intSort       50  avgt    3     0.196 ?   0.002  us/op
ArraysSort.intSort       75  avgt    3     0.429 ?   0.046  us/op
ArraysSort.intSort      100  avgt    3     0.614 ?   0.025  us/op
ArraysSort.intSort     1000  avgt    3     6.500 ?   0.084  us/op
ArraysSort.intSort    10000  avgt    3    55.620 ?   0.943  us/op
ArraysSort.intSort   100000  avgt    3   669.347 ?  75.432  us/op
ArraysSort.intSort  1000000  avgt    3  9459.001 ? 201.298  us/op

AMD EPYC 9R14 - Baseline

$ make test TEST="micro:java.lang.ArraysSort.intSort" MICRO="VM_OPTIONS=-XX:UseAVX=2"

Benchmark            (size)  Mode  Cnt      Score      Error  Units
ArraysSort.intSort       10  avgt    3      0.035 ?    0.016  us/op
ArraysSort.intSort       25  avgt    3      0.091 ?    0.009  us/op
ArraysSort.intSort       50  avgt    3      0.245 ?    0.002  us/op
ArraysSort.intSort       75  avgt    3      0.412 ?    0.004  us/op
ArraysSort.intSort      100  avgt    3      0.531 ?    0.003  us/op
ArraysSort.intSort     1000  avgt    3      8.803 ?    0.609  us/op
ArraysSort.intSort    10000  avgt    3    254.413 ?  153.004  us/op
ArraysSort.intSort   100000  avgt    3   4485.811 ?   17.517  us/op
ArraysSort.intSort  1000000  avgt    3  56552.132 ? 3124.280  us/op

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tier1-3,xcomp testing for v04 passed. I am integrating these changes.
Lets continue discussion about changes for AMD in https://bugs.openjdk.org/browse/JDK-8317976.

@vamsi-parasa
Copy link
Contributor Author

My tier1-3,xcomp testing for v04 passed. I am integrating these changes. Lets continue discussion about changes for AMD in https://bugs.openjdk.org/browse/JDK-8317976.

Thank you, Vladimir!

@vamsi-parasa
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Oct 12, 2023
@openjdk
Copy link

openjdk bot commented Oct 12, 2023

@vamsi-parasa
Your change (at version 9bce680) is now ready to be sponsored by a Committer.

@vnkozlov
Copy link
Contributor

/sponsor

@openjdk
Copy link

openjdk bot commented Oct 12, 2023

Going to push as commit 2edf9c3.
Since your change was applied there have been 36 commits pushed to the master branch:

  • 839cb19: 8317581: [s390x] Multiple test failure with LockingMode=2
  • 387896f: 8309621: [XWayland][Screencast] screen capture failure with sun.java2d.uiScale other than 1
  • 8d2ad2b: 8317977: update problemlist to include MacOS for sun/security/tools/keytool/NssTest.java
  • b92de54: 8317964: java/awt/Mouse/MouseModifiersUnitTest/MouseModifiersUnitTest_Standard.java fails on macosx-all after JDK-8317751
  • 2a80160: 8314283: Support for NSS tests on aarch64 platforms
  • 3f6d016: 8314896: additional clarifications to reversed() default methods' implementation requirements
  • 2d46b29: 8317874: Add @sealedGraph to StringTemplate.Processor.Linkage
  • 8f8c45b: 8315716: RISC-V: implement ChaCha20 intrinsic
  • 8a9c4d5: 8317675: Serial: Move gc/shared/generation to serial folder
  • bcafec5: 8316958: Add test for unstructured locking
  • ... and 26 more: https://git.openjdk.org/jdk/compare/f61499c73fe03e2e3680d7f58a84183364c5c5ac...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 12, 2023
@openjdk openjdk bot closed this Oct 12, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Oct 12, 2023
@openjdk
Copy link

openjdk bot commented Oct 12, 2023

@vnkozlov @vamsi-parasa Pushed as commit 2edf9c3.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@PaulSandoz
Copy link
Member

At least on Saphire Rapids the emulation suggested here only imposes a 6% penalty for intSort, while also mitigating the performance issue on Zen 4.

Thanks for testing this out, this seems a good solution to explore.

Is there is a way to ifdef switching on the CPU vendor? otherwise we could probably set a flag in the build script based of the CPU vendor, which may be more robust if different compilers are used.

@tarsa
Copy link

tarsa commented Nov 23, 2024

The answer for slow performance of AVX512 version of x86-simd-sort on Zen 4 is most probably explained in AMD manuals which could be found at: https://www.amd.com/en/search/documentation/hub.html#q=software%20optimization%20guide%20for%20the%20amd%20microarchitecture&f-amd_document_type=Software%20Optimization%20Guides

Software Optimization Guide for the AMD Zen4 Microarchitecture has following remark in "2.11.2 Code recommendations" chapter:

Avoid the memory destination form of COMPRESS instructions. These forms are implemented using microcode and achieve a lower store bandwidth than their register destination forms which use fastpath macro ops.

Software Optimization Guide for the AMD Zen5 Microarchitecture doesn't have any remark about COMPRESS instructions.

Could you add some code that disables the AVX512 version on Zen4, but keeps it enabled on Zen5 and future Zen architectures?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.