Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX2 implementation of mulcache #579

Closed
wants to merge 10 commits into from

Conversation

dkostic
Copy link

@dkostic dkostic commented Dec 24, 2024

Summary:
AVX2 implementation of mulcache

Slight performance improvement:

|--------------------------------------|
|  MLKEM 512 | Before | After | Improv |
|    keypair |  15101 | 14835 |  1.02x |
|     encaps |  19664 | 18631 |  1.05x |
|     decaps |  25824 | 24420 |  1.05x |
|--------------------------------------|
|  MLKEM 768 | Before | After | Improv |
|    keypair |  26187 | 25157 |  1.04x |
|     encaps |  28014 | 27248 |  1.03x |
|     decaps |  36989 | 35659 |  1.03x |
|--------------------------------------|
| MLKEM 1024 | Before | After | Improv |
|    keypair |  36014 | 35630 |  1.01x |
|     encaps |  39797 | 39347 |  1.00x |
|     decaps |  52139 | 51524 |  1.01x |
|--------------------------------------|

measured on Intel(R) Xeon(R) Platinum 8488C.

Addresses #477 .

Steps:
If your pull request consists of multiple sequential changes, please describe them here:

Performed local tests:

  • lint passing
  • tests all passing
  • tests bench passing
  • tests cbmc passing

Do you expect this change to impact performance: Yes, see above.

@dkostic dkostic requested a review from a team as a code owner December 24, 2024 19:54
Copy link
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @dkostic for working that out! Up to 5% performance uplift is worth taking, esp. since it simplifies the code and makes it more uniform across backends.

Left some small comments. The only non trivial question is whether AVX2 allows to accumulate in 32bit, in which case we can defer the reduction to after accumulation, in line with the C and AArch64 backends, also further improving performance.

@hanno-becker
Copy link
Contributor

hanno-becker commented Jan 4, 2025

@dkostic There is a segfault in the clang-18 compiler test. Can you have a look? To reproduce, enter the clang18 nix shell via nix develop --extra-experimental-features 'nix-command flakes' .#ci_clang18 and test with --opt=OPT --cflags="-O1".

@hanno-becker hanno-becker self-requested a review January 4, 2025 06:02

void poly_mulcache_compute_avx2(poly_mulcache *x, const poly *y)
{
int j;
Copy link
Contributor

@hanno-becker hanno-becker Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we aim to use assembly-level verification using HOL-Light, could we implement this in ASM rather than intrinsics, please? ISTM that this should essentilly be the code followingTODO: This could be precomputed in the mulcache in (the old) basemul.S.

(I'm aware there's more intrinsics-based code, but it will all need to be rewritten, so let's not introduce more of it)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, you want me to do it as part of this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferably, yes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI is still failing in a few places. Once pq-code-package/tsc#125 is merged, you will be able to open a PR from the main repo, and run the full CI yourself.

@hanno-becker
Copy link
Contributor

hanno-becker commented Jan 7, 2025

@dkostic Ok this is done now. You have developer rights, so should be able to push branches to the main repo and open PRs from there (closing this one). Can you do that for this PR?

dkostic added 9 commits January 8, 2025 18:35
Slight performance improvement:

|--------------------------------------|
|  MLKEM 512 | Before | After | Improv |
|    keypair |  15101 | 14835 |  1.02x |
|     encaps |  19664 | 18631 |  1.05x |
|     decaps |  25824 | 24420 |  1.05x |
|--------------------------------------|
|  MLKEM 768 | Before | After | Improv |
|    keypair |  26187 | 25157 |  1.04x |
|     encaps |  28014 | 27248 |  1.03x |
|     decaps |  36989 | 35659 |  1.03x |
|--------------------------------------|
| MLKEM 1024 | Before | After | Improv |
|    keypair |  36014 | 35630 |  1.01x |
|     encaps |  39797 | 39347 |  1.00x |
|     decaps |  52139 | 51524 |  1.01x |
|--------------------------------------|

measured on Intel(R) Xeon(R) Platinum 8488C.

Signed-off-by: dkostic <[email protected]>
Signed-off-by: dkostic <[email protected]>
Signed-off-by: dkostic <[email protected]>
Signed-off-by: dkostic <[email protected]>
Signed-off-by: dkostic <[email protected]>
Signed-off-by: dkostic <[email protected]>
Signed-off-by: dkostic <[email protected]>
Signed-off-by: dkostic <[email protected]>
Signed-off-by: dkostic <[email protected]>
Signed-off-by: dkostic <[email protected]>
@dkostic
Copy link
Author

dkostic commented Jan 8, 2025

@dkostic Ok this is done now. You have developer rights, so should be able to push branches to the main repo and open PRs from there (closing this one). Can you do that for this PR?

I'd prefer if we can finish this PR as is. I can start pushing to the main repo with the next one.

@hanno-becker
Copy link
Contributor

hanno-becker commented Jan 8, 2025

@dkostic I'm afraid we definitely need to move the PR because the full CI does not run otherwise. I would have done it in case you hadn't yet got the necessary rights -- but now you do! 🙂 But if it's too much bother for you, I can also do it - let me know.

@dkostic
Copy link
Author

dkostic commented Jan 8, 2025

@dkostic I'm afraid we definitely need to move the PR because the full CI does not run otherwise.

oh, so it was you who started the CI for my last update?

nevertheless, is this the development process for the project:

  1. clone the main repo
  2. create a local branch
  3. do changes
  4. push the local branch to the main repo
  5. open a pr

@hanno-becker
Copy link
Contributor

oh, so it was you who started the CI for my last update?

No I think the last update was automatic; but because of the way we authenticate to EC2 (OICD), large parts of the CI can currently only run for PRs from the main repo.

clone the main repo
create a local branch
do changes
push the local branch to the main repo
open a pr

Yes!

@dkostic
Copy link
Author

dkostic commented Jan 8, 2025

closing in favor of #624

@dkostic dkostic closed this Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AVX2: Use mulcache for base multiplication
3 participants