[DRAFT] AArch64: Use transposed coefficient order in NTT domain #542

hanno-becker · 2024-12-17T21:32:42Z

This PR is an experiment for using a transposed order of coefficients in NTT domain in the AArch64 backend.

The motivation to use such an ordering is the reduced shuffling cost in invNTT and NTT. The downside is that shuffling is introduced during key generation and serialization of polynomials.

At this point, there are 'clean' AArch64 assembly implementations for (a) poly_tobytes, (b) poly_frombytes, (c) poly_transpose. Initial benchmarks suggest that at least poly_{to,from}bytes() in clean AArch64 assembly has no performance benefit compared to C, so we may want to remove them. SLOTHY-optimizing them requires adding support for further instructions to SLOTHY first.

oqs-bot

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`29472` cycles	`29180` cycles	`1.01`
`ML-KEM-512 encaps`	`35726` cycles	`35550` cycles	`1.00`
`ML-KEM-512 decaps`	`46100` cycles	`46096` cycles	`1.00`
`ML-KEM-768 keypair`	`49729` cycles	`49220` cycles	`1.01`
`ML-KEM-768 encaps`	`55798` cycles	`55380` cycles	`1.01`
`ML-KEM-768 decaps`	`70297` cycles	`70208` cycles	`1.00`
`ML-KEM-1024 keypair`	`73281` cycles	`72237` cycles	`1.01`
`ML-KEM-1024 encaps`	`81874` cycles	`81077` cycles	`1.01`
`ML-KEM-1024 decaps`	`101362` cycles	`100881` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Intel Xeon 4th gen (c7i)

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`13504` cycles	`13533` cycles	`1.00`
`ML-KEM-512 encaps`	`17247` cycles	`17316` cycles	`1.00`
`ML-KEM-512 decaps`	`22712` cycles	`22850` cycles	`0.99`
`ML-KEM-768 keypair`	`22512` cycles	`22537` cycles	`1.00`
`ML-KEM-768 encaps`	`24512` cycles	`24505` cycles	`1.00`
`ML-KEM-768 decaps`	`32470` cycles	`32559` cycles	`1.00`
`ML-KEM-1024 keypair`	`31354` cycles	`31461` cycles	`1.00`
`ML-KEM-1024 encaps`	`34919` cycles	`34950` cycles	`1.00`
`ML-KEM-1024 decaps`	`45538` cycles	`45838` cycles	`0.99`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Intel Xeon 4th gen (c7i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite	Current: `34bd687`	Previous: `2dd10c1`	Ratio
`ML-KEM-512 keypair`	`13956` cycles	`13531` cycles	`1.03`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Intel Xeon 3rd gen (c6i)

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`20334` cycles	`20340` cycles	`1.00`
`ML-KEM-512 encaps`	`27010` cycles	`27297` cycles	`0.99`
`ML-KEM-512 decaps`	`36103` cycles	`35844` cycles	`1.01`
`ML-KEM-768 keypair`	`34902` cycles	`34886` cycles	`1.00`
`ML-KEM-768 encaps`	`38135` cycles	`38189` cycles	`1.00`
`ML-KEM-768 decaps`	`50945` cycles	`50924` cycles	`1.00`
`ML-KEM-1024 keypair`	`47965` cycles	`48027` cycles	`1.00`
`ML-KEM-1024 encaps`	`54139` cycles	`54197` cycles	`1.00`
`ML-KEM-1024 decaps`	`71555` cycles	`71804` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 3rd gen (c6a)

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`18135` cycles	`18137` cycles	`1.00`
`ML-KEM-512 encaps`	`23188` cycles	`23201` cycles	`1.00`
`ML-KEM-512 decaps`	`30504` cycles	`30511` cycles	`1.00`
`ML-KEM-768 keypair`	`31069` cycles	`31078` cycles	`1.00`
`ML-KEM-768 encaps`	`34197` cycles	`34162` cycles	`1.00`
`ML-KEM-768 decaps`	`44765` cycles	`44729` cycles	`1.00`
`ML-KEM-1024 keypair`	`44632` cycles	`44565` cycles	`1.00`
`ML-KEM-1024 encaps`	`49913` cycles	`49897` cycles	`1.00`
`ML-KEM-1024 decaps`	`64436` cycles	`64402` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 4th gen (c7a)

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`15148` cycles	`15077` cycles	`1.00`
`ML-KEM-512 encaps`	`19667` cycles	`19660` cycles	`1.00`
`ML-KEM-512 decaps`	`26296` cycles	`26308` cycles	`1.00`
`ML-KEM-768 keypair`	`25649` cycles	`25578` cycles	`1.00`
`ML-KEM-768 encaps`	`28150` cycles	`28154` cycles	`1.00`
`ML-KEM-768 decaps`	`37934` cycles	`37846` cycles	`1.00`
`ML-KEM-1024 keypair`	`35756` cycles	`35641` cycles	`1.00`
`ML-KEM-1024 encaps`	`41039` cycles	`40966` cycles	`1.00`
`ML-KEM-1024 decaps`	`54472` cycles	`54548` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`34885` cycles	`34897` cycles	`1.00`
`ML-KEM-512 encaps`	`44988` cycles	`44978` cycles	`1.00`
`ML-KEM-512 decaps`	`58930` cycles	`58945` cycles	`1.00`
`ML-KEM-768 keypair`	`59118` cycles	`59225` cycles	`1.00`
`ML-KEM-768 encaps`	`71690` cycles	`71828` cycles	`1.00`
`ML-KEM-768 decaps`	`89296` cycles	`89397` cycles	`1.00`
`ML-KEM-1024 keypair`	`87543` cycles	`87542` cycles	`1.00`
`ML-KEM-1024 encaps`	`104592` cycles	`104646` cycles	`1.00`
`ML-KEM-1024 decaps`	`127574` cycles	`127678` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton3

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`19218` cycles	`19003` cycles	`1.01`
`ML-KEM-512 encaps`	`23623` cycles	`23602` cycles	`1.00`
`ML-KEM-512 decaps`	`30590` cycles	`30765` cycles	`0.99`
`ML-KEM-768 keypair`	`32665` cycles	`32272` cycles	`1.01`
`ML-KEM-768 encaps`	`35910` cycles	`35727` cycles	`1.01`
`ML-KEM-768 decaps`	`45865` cycles	`45872` cycles	`1.00`
`ML-KEM-1024 keypair`	`47546` cycles	`46840` cycles	`1.02`
`ML-KEM-1024 encaps`	`53097` cycles	`52612` cycles	`1.01`
`ML-KEM-1024 decaps`	`66766` cycles	`66495` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`56603` cycles	`56640` cycles	`1.00`
`ML-KEM-512 encaps`	`69452` cycles	`69527` cycles	`1.00`
`ML-KEM-512 decaps`	`91509` cycles	`91505` cycles	`1.00`
`ML-KEM-768 keypair`	`91842` cycles	`91903` cycles	`1.00`
`ML-KEM-768 encaps`	`107769` cycles	`107822` cycles	`1.00`
`ML-KEM-768 decaps`	`136356` cycles	`136407` cycles	`1.00`
`ML-KEM-1024 keypair`	`134654` cycles	`134896` cycles	`1.00`
`ML-KEM-1024 encaps`	`155204` cycles	`155420` cycles	`1.00`
`ML-KEM-1024 decaps`	`191555` cycles	`191704` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton2

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`29481` cycles	`29190` cycles	`1.01`
`ML-KEM-512 encaps`	`35733` cycles	`35560` cycles	`1.00`
`ML-KEM-512 decaps`	`46114` cycles	`46112` cycles	`1.00`
`ML-KEM-768 keypair`	`49749` cycles	`49228` cycles	`1.01`
`ML-KEM-768 encaps`	`55801` cycles	`55407` cycles	`1.01`
`ML-KEM-768 decaps`	`70322` cycles	`70217` cycles	`1.00`
`ML-KEM-1024 keypair`	`73133` cycles	`72353` cycles	`1.01`
`ML-KEM-1024 encaps`	`81788` cycles	`81163` cycles	`1.01`
`ML-KEM-1024 decaps`	`101436` cycles	`100914` cycles	`1.01`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton4

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`18352` cycles	`18199` cycles	`1.01`
`ML-KEM-512 encaps`	`22243` cycles	`22233` cycles	`1.00`
`ML-KEM-512 decaps`	`28830` cycles	`28985` cycles	`0.99`
`ML-KEM-768 keypair`	`31033` cycles	`30681` cycles	`1.01`
`ML-KEM-768 encaps`	`33948` cycles	`33725` cycles	`1.01`
`ML-KEM-768 decaps`	`43387` cycles	`43302` cycles	`1.00`
`ML-KEM-1024 keypair`	`45020` cycles	`44354` cycles	`1.02`
`ML-KEM-1024 encaps`	`50309` cycles	`49788` cycles	`1.01`
`ML-KEM-1024 decaps`	`63131` cycles	`62840` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`45740` cycles	`45720` cycles	`1.00`
`ML-KEM-512 encaps`	`56887` cycles	`56867` cycles	`1.00`
`ML-KEM-512 decaps`	`76250` cycles	`76234` cycles	`1.00`
`ML-KEM-768 keypair`	`74537` cycles	`74544` cycles	`1.00`
`ML-KEM-768 encaps`	`88575` cycles	`88570` cycles	`1.00`
`ML-KEM-768 decaps`	`114466` cycles	`114433` cycles	`1.00`
`ML-KEM-1024 keypair`	`109432` cycles	`109465` cycles	`1.00`
`ML-KEM-1024 encaps`	`127440` cycles	`127494` cycles	`1.00`
`ML-KEM-1024 decaps`	`160006` cycles	`160139` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`52412` cycles	`52178` cycles	`1.00`
`ML-KEM-512 encaps`	`65433` cycles	`65783` cycles	`0.99`
`ML-KEM-512 decaps`	`88545` cycles	`88428` cycles	`1.00`
`ML-KEM-768 keypair`	`84398` cycles	`84729` cycles	`1.00`
`ML-KEM-768 encaps`	`102134` cycles	`101502` cycles	`1.01`
`ML-KEM-768 decaps`	`131336` cycles	`132074` cycles	`0.99`
`ML-KEM-1024 keypair`	`124791` cycles	`124073` cycles	`1.01`
`ML-KEM-1024 encaps`	`145257` cycles	`145769` cycles	`1.00`
`ML-KEM-1024 decaps`	`182747` cycles	`183677` cycles	`0.99`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton3 (no-opt)

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`45389` cycles	`45386` cycles	`1.00`
`ML-KEM-512 encaps`	`54212` cycles	`54214` cycles	`1.00`
`ML-KEM-512 decaps`	`71145` cycles	`71155` cycles	`1.00`
`ML-KEM-768 keypair`	`74835` cycles	`74823` cycles	`1.00`
`ML-KEM-768 encaps`	`86077` cycles	`86063` cycles	`1.00`
`ML-KEM-768 decaps`	`108672` cycles	`108802` cycles	`1.00`
`ML-KEM-1024 keypair`	`111101` cycles	`111111` cycles	`1.00`
`ML-KEM-1024 encaps`	`125926` cycles	`125936` cycles	`1.00`
`ML-KEM-1024 decaps`	`154574` cycles	`154635` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton4 (no-opt)

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`42016` cycles	`41978` cycles	`1.00`
`ML-KEM-512 encaps`	`50090` cycles	`50164` cycles	`1.00`
`ML-KEM-512 decaps`	`66110` cycles	`66049` cycles	`1.00`
`ML-KEM-768 keypair`	`69111` cycles	`69057` cycles	`1.00`
`ML-KEM-768 encaps`	`79859` cycles	`79763` cycles	`1.00`
`ML-KEM-768 decaps`	`101129` cycles	`101019` cycles	`1.00`
`ML-KEM-1024 keypair`	`102212` cycles	`102456` cycles	`1.00`
`ML-KEM-1024 encaps`	`117206` cycles	`117443` cycles	`1.00`
`ML-KEM-1024 decaps`	`143653` cycles	`143389` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton2 (no-opt)

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`71292` cycles	`71162` cycles	`1.00`
`ML-KEM-512 encaps`	`85135` cycles	`85065` cycles	`1.00`
`ML-KEM-512 decaps`	`112641` cycles	`112770` cycles	`1.00`
`ML-KEM-768 keypair`	`117612` cycles	`117261` cycles	`1.00`
`ML-KEM-768 encaps`	`135290` cycles	`135096` cycles	`1.00`
`ML-KEM-768 decaps`	`172010` cycles	`171735` cycles	`1.00`
`ML-KEM-1024 keypair`	`175111` cycles	`174233` cycles	`1.01`
`ML-KEM-1024 encaps`	`197212` cycles	`196442` cycles	`1.00`
`ML-KEM-1024 decaps`	`243384` cycles	`242511` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Bananapi bpi-f3 benchmarks

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`335027` cycles	`335046` cycles	`1.00`
`ML-KEM-512 encaps`	`445694` cycles	`445607` cycles	`1.00`
`ML-KEM-512 decaps`	`593806` cycles	`593856` cycles	`1.00`
`ML-KEM-768 keypair`	`556185` cycles	`556062` cycles	`1.00`
`ML-KEM-768 encaps`	`698052` cycles	`697865` cycles	`1.00`
`ML-KEM-768 decaps`	`890484` cycles	`889403` cycles	`1.00`
`ML-KEM-1024 keypair`	`821541` cycles	`821286` cycles	`1.00`
`ML-KEM-1024 encaps`	`998894` cycles	`998065` cycles	`1.00`
`ML-KEM-1024 decaps`	`1229586` cycles	`1230119` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Arm Cortex-A55 (Snapdragon 888) benchmarks

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`58878` cycles	`57964` cycles	`1.02`
`ML-KEM-512 encaps`	`66398` cycles	`65238` cycles	`1.02`
`ML-KEM-512 decaps`	`84689` cycles	`83987` cycles	`1.01`
`ML-KEM-768 keypair`	`100048` cycles	`97986` cycles	`1.02`
`ML-KEM-768 encaps`	`111377` cycles	`109131` cycles	`1.02`
`ML-KEM-768 decaps`	`137465` cycles	`135466` cycles	`1.01`
`ML-KEM-1024 keypair`	`151970` cycles	`148680` cycles	`1.02`
`ML-KEM-1024 encaps`	`169161` cycles	`164777` cycles	`1.03`
`ML-KEM-1024 decaps`	`203998` cycles	`200003` cycles	`1.02`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Arm Cortex-A72 (Raspberry Pi 4) benchmarks

Benchmark suite	Current: `29a1909`	Previous: `271b362`	Ratio
`ML-KEM-512 keypair`	`52188` cycles	`52303` cycles	`1.00`
`ML-KEM-512 encaps`	`58976` cycles	`59330` cycles	`0.99`
`ML-KEM-512 decaps`	`75202` cycles	`75340` cycles	`1.00`
`ML-KEM-768 keypair`	`88727` cycles	`88072` cycles	`1.01`
`ML-KEM-768 encaps`	`97462` cycles	`96577` cycles	`1.01`
`ML-KEM-768 decaps`	`120596` cycles	`119275` cycles	`1.01`
`ML-KEM-1024 keypair`	`133786` cycles	`132209` cycles	`1.01`
`ML-KEM-1024 encaps`	`147253` cycles	`144750` cycles	`1.02`
`ML-KEM-1024 decaps`	`178307` cycles	`175884` cycles	`1.01`

This comment was automatically generated by workflow using github-action-benchmark.

mkannwischer

This does not result in better performance on any uArch. Why would you like to add this?

hanno-becker · 2024-12-18T04:00:58Z

This does not result in better performance on any uArch. Why would you like to add this?

@mkannwischer This is part of an exploration with @jargh on whether it's useful to use the transposed NTT order for AArch64 (as we do for x86_64). However, since poly_{to,from}bytes() work in NTT domain, they need adjusting, and therefore we currently require that native implementations be present.

hanno-becker · 2024-12-18T04:04:28Z

@mkannwischer Looks like removing the existing (clean, admittedly) poly_tobytes() also has no bearing.

This commit adds an AArch64 implementation for `poly_frombytes()`. Like the already existing `poly_tobytes()`, we do not yet optimize it using SLOTHY, but work with the clean version in both the clean ahd the optimized backend. Applying SLOTHY to both needs work on the (micro)architecture models first. Signed-off-by: Hanno Becker <[email protected]>

This commit modifies the AArch64 arithmetic backend to use a transposed order of polynomial coefficients in NTT domain. - In the forward NTT, this saves a st4 - In the inverse NTT, this saves a ld4 - No cost in the base multiplication: We merely need to shuffle the twiddles for the mulcache computation, which is done through a change to `autogenerate_files.py`. - A temporary change is made to `polyvec.c`, adding the permutation before/after to/from bytes. This will be removed once those functions are adjusted to respect the custom order. - For now, the coefficient permutation is written in simple-minded C. This will be removed in a subsequent commit. Signed-off-by: Hanno Becker <[email protected]>

Also, add clean AArch64 assembly for custom order permutation. Signed-off-by: Hanno Becker <[email protected]>

Signed-off-by: Hanno Becker <[email protected]>

This commit reoptimizes NTT and invNTT with the custom order, using SLOTHY. We copy the clean versions of `poly_{tobyte,frombytes,transpose}` for now. This finishes a prototype of the optimized AArch64 backend using the custom NTT order. Signed-off-by: Hanno Becker <[email protected]>

hanno-becker · 2024-12-18T10:27:26Z

Agreed with @jargh that we are not pursuing this for now.

hanno-becker marked this pull request as ready for review December 17, 2024 21:32

hanno-becker requested a review from a team as a code owner December 17, 2024 21:32

hanno-becker force-pushed the aarch64_poly_frombytes branch from 08da95f to 34bd687 Compare December 17, 2024 21:33

hanno-becker added the benchmark this PR should be benchmarked in CI label Dec 17, 2024

oqs-bot reviewed Dec 17, 2024

View reviewed changes

mkannwischer reviewed Dec 17, 2024

View reviewed changes

mkannwischer force-pushed the aarch64_poly_frombytes branch from 34bd687 to 112e6ef Compare December 17, 2024 23:06

hanno-becker force-pushed the aarch64_poly_frombytes branch from 112e6ef to c1da0ac Compare December 18, 2024 03:34

hanno-becker added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Dec 18, 2024

hanno-becker force-pushed the aarch64_poly_frombytes branch from c1da0ac to 4f0f2e8 Compare December 18, 2024 06:49

hanno-becker changed the title ~~AArch64: Add native poly_frombytes() implementation~~ [DRAFT] AArch64: Use transposed coefficient order in NTT domain Dec 18, 2024

hanno-becker added 5 commits December 18, 2024 06:52

AArch64: Integrate custom order into poly_frombytes()

a88c531

Also, add clean AArch64 assembly for custom order permutation. Signed-off-by: Hanno Becker <[email protected]>

AArch64: Integrate transpose into poly_tobytes()

656702b

Signed-off-by: Hanno Becker <[email protected]>

hanno-becker force-pushed the aarch64_poly_frombytes branch from 4f0f2e8 to 29a1909 Compare December 18, 2024 06:53

hanno-becker added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Dec 18, 2024

hanno-becker marked this pull request as draft December 18, 2024 07:18

hanno-becker closed this Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] AArch64: Use transposed coefficient order in NTT domain #542

[DRAFT] AArch64: Use transposed coefficient order in NTT domain #542

hanno-becker commented Dec 17, 2024 •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

mkannwischer left a comment

hanno-becker commented Dec 18, 2024

hanno-becker commented Dec 18, 2024

hanno-becker commented Dec 18, 2024

[DRAFT] AArch64: Use transposed coefficient order in NTT domain #542

[DRAFT] AArch64: Use transposed coefficient order in NTT domain #542

Conversation

hanno-becker commented Dec 17, 2024 • edited Loading

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

Intel Xeon 4th gen (c7i)

oqs-bot left a comment

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

Intel Xeon 3rd gen (c6i)

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

AMD EPYC 3rd gen (c6a)

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

AMD EPYC 4th gen (c7a)

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

Intel Xeon 4th gen (c7i) (no-opt)

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

Graviton3

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

Intel Xeon 3rd gen (c6i) (no-opt)

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

Graviton2

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

Graviton4

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

AMD EPYC 4th gen (c7a) (no-opt)

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

AMD EPYC 3rd gen (c6a) (no-opt)

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

Graviton3 (no-opt)

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

Graviton4 (no-opt)

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

Graviton2 (no-opt)

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

Bananapi bpi-f3 benchmarks

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

Arm Cortex-A55 (Snapdragon 888) benchmarks

oqs-bot left a comment • edited Loading

Choose a reason for hiding this comment

Arm Cortex-A72 (Raspberry Pi 4) benchmarks

mkannwischer left a comment

Choose a reason for hiding this comment

hanno-becker commented Dec 18, 2024

hanno-becker commented Dec 18, 2024

hanno-becker commented Dec 18, 2024

hanno-becker commented Dec 17, 2024 •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading