Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce global config and reorganize backends #535

Merged
merged 3 commits into from
Dec 17, 2024
Merged

Conversation

hanno-becker
Copy link
Contributor

@hanno-becker hanno-becker commented Dec 16, 2024

This commit introduces a global configuration file mlkem/config.h
which should contain all user-configurable parameters. With this
commit, it contains:

  • MLKEM_K
  • MLKEM_NAMESPACE
  • FIPS202_NAMESPACE
  • MLKEM_USE_NATIVE
  • MLKEM_NATIVE_ARITH_BACKEND
  • MLKEM_NATIVE_FIPS202_BACKEND

The backends have been reorganized to follow a simpler file structure:

Every backend profile is identified by metadata file in the toplevel
directory of the backend. For example, aarch64 has opt.h and clean.h.
Those metadata files so far only set the name of the backend, and point
to the actual implementation. The reason why the metadata file and the
implementation are kept separate is so that assembly files can include
the metadata file and know if they should be assembled: For example,
aarch64/opt.h sets MLKEM_NATIVE_ARITH_BACKEND_AARCH64_OPT which all
relevant files are guarded by; similar for clean. Previously, they were
all guarded more coarsely by MLKEM_USE_NATIVE_AARCH64 or
MLKEM_USE_NATIVE_X86_64 -- those have been removed.

The source code of the backends has been moved into src directories.

Ultimately, we may want to split aarch64 into aarch64_opt and aarch64_clean,
so the distinction between profile and backend goes away, but this is not yet
attempted.

====

An example in examples/ is added which demonstrates how to use a custom
backend and custom config.

@hanno-becker hanno-becker force-pushed the profile_cleanup branch 9 times, most recently from 84ab059 to a462f02 Compare December 16, 2024 06:50
@hanno-becker hanno-becker added the benchmark this PR should be benchmarked in CI label Dec 16, 2024
Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

Benchmark suite Current: 8dff47b Previous: 668dbab Ratio
ML-KEM-512 keypair 29186 cycles 29184 cycles 1.00
ML-KEM-512 encaps 35547 cycles 35554 cycles 1.00
ML-KEM-512 decaps 46136 cycles 46098 cycles 1.00
ML-KEM-768 keypair 49229 cycles 49232 cycles 1.00
ML-KEM-768 encaps 55399 cycles 55386 cycles 1.00
ML-KEM-768 decaps 70203 cycles 70236 cycles 1.00
ML-KEM-1024 keypair 72171 cycles 72218 cycles 1.00
ML-KEM-1024 encaps 81016 cycles 81129 cycles 1.00
ML-KEM-1024 decaps 100875 cycles 100871 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

This comment was marked as outdated.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: 8dff47b Previous: 668dbab Ratio
ML-KEM-512 keypair 13523 cycles 13505 cycles 1.00
ML-KEM-512 encaps 17257 cycles 17476 cycles 0.99
ML-KEM-512 decaps 22741 cycles 22734 cycles 1.00
ML-KEM-768 keypair 22524 cycles 22497 cycles 1.00
ML-KEM-768 encaps 24529 cycles 24466 cycles 1.00
ML-KEM-768 decaps 32530 cycles 32453 cycles 1.00
ML-KEM-1024 keypair 31376 cycles 31374 cycles 1.00
ML-KEM-1024 encaps 34922 cycles 34930 cycles 1.00
ML-KEM-1024 decaps 45729 cycles 45768 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

This comment was marked as outdated.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: 8dff47b Previous: 668dbab Ratio
ML-KEM-512 keypair 18158 cycles 18216 cycles 1.00
ML-KEM-512 encaps 23194 cycles 23145 cycles 1.00
ML-KEM-512 decaps 30526 cycles 30478 cycles 1.00
ML-KEM-768 keypair 31067 cycles 31108 cycles 1.00
ML-KEM-768 encaps 34152 cycles 34212 cycles 1.00
ML-KEM-768 decaps 44834 cycles 44770 cycles 1.00
ML-KEM-1024 keypair 44603 cycles 44518 cycles 1.00
ML-KEM-1024 encaps 49935 cycles 49892 cycles 1.00
ML-KEM-1024 decaps 64438 cycles 64383 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

This comment was marked as outdated.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: 8dff47b Previous: 668dbab Ratio
ML-KEM-512 keypair 20331 cycles 20332 cycles 1.00
ML-KEM-512 encaps 27000 cycles 27002 cycles 1.00
ML-KEM-512 decaps 35809 cycles 35838 cycles 1.00
ML-KEM-768 keypair 34889 cycles 34882 cycles 1.00
ML-KEM-768 encaps 38113 cycles 38175 cycles 1.00
ML-KEM-768 decaps 50917 cycles 50904 cycles 1.00
ML-KEM-1024 keypair 47988 cycles 47974 cycles 1.00
ML-KEM-1024 encaps 54179 cycles 54135 cycles 1.00
ML-KEM-1024 decaps 71615 cycles 71703 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

This comment was marked as outdated.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: 8dff47b Previous: 668dbab Ratio
ML-KEM-512 keypair 15086 cycles 15078 cycles 1.00
ML-KEM-512 encaps 19688 cycles 19663 cycles 1.00
ML-KEM-512 decaps 26339 cycles 26313 cycles 1.00
ML-KEM-768 keypair 25688 cycles 25609 cycles 1.00
ML-KEM-768 encaps 28250 cycles 28179 cycles 1.00
ML-KEM-768 decaps 37912 cycles 37856 cycles 1.00
ML-KEM-1024 keypair 35680 cycles 35661 cycles 1.00
ML-KEM-1024 encaps 41087 cycles 40971 cycles 1.00
ML-KEM-1024 decaps 54544 cycles 54496 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

This comment was marked as outdated.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: 8dff47b Previous: 668dbab Ratio
ML-KEM-512 keypair 34852 cycles 34833 cycles 1.00
ML-KEM-512 encaps 45035 cycles 45065 cycles 1.00
ML-KEM-512 decaps 58906 cycles 58937 cycles 1.00
ML-KEM-768 keypair 59125 cycles 59101 cycles 1.00
ML-KEM-768 encaps 71782 cycles 71728 cycles 1.00
ML-KEM-768 decaps 89196 cycles 89239 cycles 1.00
ML-KEM-1024 keypair 87509 cycles 87839 cycles 1.00
ML-KEM-1024 encaps 104651 cycles 104235 cycles 1.00
ML-KEM-1024 decaps 127490 cycles 126864 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: 8dff47b Previous: 668dbab Ratio
ML-KEM-512 keypair 19000 cycles 18993 cycles 1.00
ML-KEM-512 encaps 23609 cycles 23579 cycles 1.00
ML-KEM-512 decaps 30774 cycles 30754 cycles 1.00
ML-KEM-768 keypair 32288 cycles 32245 cycles 1.00
ML-KEM-768 encaps 35729 cycles 35712 cycles 1.00
ML-KEM-768 decaps 45855 cycles 45882 cycles 1.00
ML-KEM-1024 keypair 46847 cycles 46847 cycles 1
ML-KEM-1024 encaps 52618 cycles 52634 cycles 1.00
ML-KEM-1024 decaps 66482 cycles 66481 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

This comment was marked as outdated.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: 8dff47b Previous: 668dbab Ratio
ML-KEM-512 keypair 18191 cycles 18202 cycles 1.00
ML-KEM-512 encaps 22232 cycles 22232 cycles 1
ML-KEM-512 decaps 28966 cycles 28996 cycles 1.00
ML-KEM-768 keypair 30676 cycles 30680 cycles 1.00
ML-KEM-768 encaps 33721 cycles 33736 cycles 1.00
ML-KEM-768 decaps 43285 cycles 43315 cycles 1.00
ML-KEM-1024 keypair 44360 cycles 44368 cycles 1.00
ML-KEM-1024 encaps 49783 cycles 49789 cycles 1.00
ML-KEM-1024 decaps 62851 cycles 62848 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

This comment was marked as outdated.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: 8dff47b Previous: 668dbab Ratio
ML-KEM-512 keypair 52423 cycles 52148 cycles 1.01
ML-KEM-512 encaps 65446 cycles 65745 cycles 1.00
ML-KEM-512 decaps 88563 cycles 88346 cycles 1.00
ML-KEM-768 keypair 84288 cycles 84709 cycles 1.00
ML-KEM-768 encaps 102049 cycles 101766 cycles 1.00
ML-KEM-768 decaps 131287 cycles 132010 cycles 0.99
ML-KEM-1024 keypair 124608 cycles 124006 cycles 1.00
ML-KEM-1024 encaps 145088 cycles 145709 cycles 1.00
ML-KEM-1024 decaps 182854 cycles 183602 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: 8dff47b Previous: 668dbab Ratio
ML-KEM-512 keypair 29191 cycles 29193 cycles 1.00
ML-KEM-512 encaps 35560 cycles 35560 cycles 1
ML-KEM-512 decaps 46144 cycles 46109 cycles 1.00
ML-KEM-768 keypair 49208 cycles 49229 cycles 1.00
ML-KEM-768 encaps 55410 cycles 55407 cycles 1.00
ML-KEM-768 decaps 70226 cycles 70223 cycles 1.00
ML-KEM-1024 keypair 72208 cycles 72358 cycles 1.00
ML-KEM-1024 encaps 81022 cycles 81166 cycles 1.00
ML-KEM-1024 decaps 100891 cycles 100836 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

This comment was marked as outdated.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: 8dff47b Previous: 668dbab Ratio
ML-KEM-512 keypair 45797 cycles 45723 cycles 1.00
ML-KEM-512 encaps 56953 cycles 56858 cycles 1.00
ML-KEM-512 decaps 76303 cycles 76248 cycles 1.00
ML-KEM-768 keypair 74600 cycles 74537 cycles 1.00
ML-KEM-768 encaps 88666 cycles 88586 cycles 1.00
ML-KEM-768 decaps 114607 cycles 114435 cycles 1.00
ML-KEM-1024 keypair 109558 cycles 109413 cycles 1.00
ML-KEM-1024 encaps 127641 cycles 127490 cycles 1.00
ML-KEM-1024 decaps 160203 cycles 160181 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: 8dff47b Previous: 668dbab Ratio
ML-KEM-512 keypair 56596 cycles 56618 cycles 1.00
ML-KEM-512 encaps 69451 cycles 69458 cycles 1.00
ML-KEM-512 decaps 91407 cycles 91377 cycles 1.00
ML-KEM-768 keypair 91821 cycles 91849 cycles 1.00
ML-KEM-768 encaps 107799 cycles 107762 cycles 1.00
ML-KEM-768 decaps 136350 cycles 136305 cycles 1.00
ML-KEM-1024 keypair 134787 cycles 134894 cycles 1.00
ML-KEM-1024 encaps 155210 cycles 155288 cycles 1.00
ML-KEM-1024 decaps 191544 cycles 191611 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@hanno-becker hanno-becker added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Dec 16, 2024
@hanno-becker hanno-becker added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Dec 16, 2024
@hanno-becker hanno-becker added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Dec 16, 2024
This commit introduces a global configuration file `mlkem/config.h`
which should contain all user-configurable parameters. With this
commit, it contains:

- MLKEM_K
- MLKEM_NAMESPACE
- FIPS202_NAMESPACE
- MLKEM_USE_NATIVE
- MLKEM_NATIVE_ARITH_BACKEND
- MLKEM_NATIVE_FIPS202_BACKEND

The backends have been reorganized to follow a simpler file structure:

Every backend profile is identified by metadata file in the toplevel
directory of the backend. For example, `aarch64` has `opt.h` and `clean.h`.
Those metadata files so far only set the name of the backend, and point
to the actual implementation. The reason why the metadata file and the
implementation are kept separate is so that assembly files can include
the metadata file and know if they should be assembled: For example,
`aarch64/opt.h` sets `MLKEM_NATIVE_ARITH_BACKEND_AARCH64_OPT` which all
relevant files are guarded by; similar for clean. Previously, they were
all guarded more coarsely by `MLKEM_USE_NATIVE_AARCH64` or
`MLKEM_USE_NATIVE_X86_64` -- those have been removed.

The source code of the backends has been moved into `src` directories.

Ultimately, we may want to split `aarch64` into `aarch64_opt` and `aarch64_clean`,
so the distinction between profile and backend goes away, but this is not yet
attempted.

Signed-off-by: Hanno Becker <[email protected]>
@hanno-becker hanno-becker force-pushed the profile_cleanup branch 7 times, most recently from 85c1cdf to 8dff47b Compare December 16, 2024 20:38
@hanno-becker hanno-becker added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Dec 16, 2024
This commit adds another minimal example to `examples/`, demonstrating
how to use a custom configuration file and a custom FIPS-202 backend.

Signed-off-by: Hanno Becker <[email protected]>
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Hanno.

@hanno-becker hanno-becker merged commit b410700 into main Dec 17, 2024
46 checks passed
@hanno-becker hanno-becker deleted the profile_cleanup branch December 17, 2024 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark this PR should be benchmarked in CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants