AVX-512 support for RSA Signing #1273

pittma · 2023-10-30T18:35:22Z

Description of changes:

This patch adds AVX-512 support for RSA 2k, 3k and 4k signing. It is built around the use of AVX512_IFMA within the (Almost) Montgomery Multiplication implementation that comprises the modular exponentiation part of the RSA algorithm. It is ported from the OpenSSL patch.

When running the provided speed tests, the following contains the results with and without this patch:

There is currently not support for 8k, so no change there. However, this could be a follow on if there is interest in that.

Call-outs:

This patch is primarily additive modulo a small logic change that occurs here, where, previously, the calls to mod_montgomery and BN_mod_exp_mont_consttime were interleaved. The intermediate value of r1 is needed for the first exponentiation call; in order to make this possible when doing parallel exponentiations, we create a new BIGNUM on the context (r2).

Testing:

I added coverage for the fuzzer and borrowed a couple of test cases from the existing mod_exp tests to hit the new BN_mod_exp_mont_consttime_x2 function. I'm more than happy to pull out more cases from those tests, or whatever else is suggested here, just let me know!

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

crypto/fipsmodule/CMakeLists.txt

pittma · 2023-11-03T18:23:13Z

Thanks for catching that @dkostic; updated.

crypto/fipsmodule/CMakeLists.txt

codecov-commenter · 2023-11-16T00:35:21Z

Codecov Report

Attention: Patch coverage is 19.48052% with 310 lines in your changes missing coverage. Please review.

Project coverage is 78.30%. Comparing base (9d21f38) to head (f3715bb).

Files with missing lines	Patch %	Lines
crypto/fipsmodule/bn/rsaz_exp_x2.c	0.00%	249 Missing ⚠️
crypto/fipsmodule/bn/exponentiation.c	19.04%	34 Missing ⚠️
crypto/fipsmodule/bn/bn_test.cc	48.83%	21 Missing and 1 partial ⚠️
crypto/impl_dispatch_test.cc	88.09%	3 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1273      +/-   ##
==========================================
- Coverage   78.51%   78.30%   -0.22%     
==========================================
  Files         583      584       +1     
  Lines       98809    99188     +379     
  Branches    14159    14189      +30     
==========================================
+ Hits        77583    77666      +83     
- Misses      20598    20892     +294     
- Partials      628      630       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nebeid · 2023-12-01T21:35:22Z

crypto/fipsmodule/bn/rsaz_exp_x2.c

+#include <string.h>
+#include "rsaz_exp.h"
+
+# define ALIGN_OF(ptr, boundary) \


We can use

aws-lc/crypto/poly1305/poly1305.c

Line 45 in 7e6aef8

static inline struct poly1305_state_st *poly1305_aligned_state(

nebeid · 2023-12-04T19:04:46Z

crypto/fipsmodule/bn/exponentiation.c

+                                 const BIGNUM *m2, const BN_MONT_CTX *in_mont2,
+                                 BN_CTX *ctx)
+{
+    int ret = 0;


Can the indentation be set to 2 spaces?

nebeid · 2023-12-05T16:49:57Z

crypto/fipsmodule/bn/rsaz_exp_x2.c

+    return (bitsize + digit_size - 1) / digit_size;
+}
+
+/*


I suggest these declarations be moved to crypto/fipsmodule/bn/internal.h and be documented there.

nebeid · 2023-12-05T16:50:36Z

crypto/fipsmodule/bn/asm/rsaz-2k-avx512.pl

+###############################################################################
+{
+# input parameters ("%rdi","%rsi","%rdx","%rcx","%r8")
+my ($res,$a,$b,$m,$k0) = @_6_args_universal_ABI;


It's not clear to me if this takes win64 into account.

I think you're right, I don't think it is either! I've updated this with a ternary check.

nebeid · 2023-12-05T20:55:24Z

crypto/fipsmodule/bn/asm/rsaz-2k-avx512.pl

+# Registers mapping for normalization.
+my ($T0,$T0h,$T1,$T1h,$T2) = ("$zero", "$Bi", "$Yi", map("%ymm$_", (25..26)));
+
+sub amm52x20_x1() {


If this is Algorithm 7 (Fig 5) in the J Cryptographic Eng. (2012) paper or Alg 3 in iacr 2011-239, can the exact algorithm be cited. It would be great if the various blocks of steps are annotated with the steps from the algorithm.

Hi @nebeid, could I plan to follow up with a second PR with the documentation?

Thank you, Dan. We will be looking at the delocate issue on Arm. If, meanwhile, you can shed any light on the algorithm steps, that would be much appreciated.

pittma · 2024-01-30T15:43:09Z

Looks like Linux CI runs are failing. Would someone mind sharing the details on those failures?

dkostic · 2024-01-30T19:35:19Z

Looks like Linux CI runs are failing. Would someone mind sharing the details on those failures?


FAILED: crypto/fipsmodule/CMakeFiles/bcm_hashunset.dir/bcm-delocated.S.o
--
2390 | /usr/bin/clang-6.0 -DBORINGSSL_FIPS -DBORINGSSL_IMPLEMENTATION -DFIPS_ENTROPY_SOURCE_PASSIVE -Isymbol_prefix_include -I../include -Wa,--noexecstack -O3 -DNDEBUG -fPIC   -mavx512f -mavx512bw -mavx512dq -mavx512vl -MD -MT crypto/fipsmodule/CMakeFiles/bcm_hashunset.dir/bcm-delocated.S.o -MF crypto/fipsmodule/CMakeFiles/bcm_hashunset.dir/bcm-delocated.S.o.d -o crypto/fipsmodule/CMakeFiles/bcm_hashunset.dir/bcm-delocated.S.o -c crypto/fipsmodule/bcm-delocated.S
2391 | crypto/fipsmodule/bcm-delocated.S:1300482:2: error: instruction requires: AVX-512 IFMA ISA
2392 | vpmadd52luq 0(%rsi),%ymm1,%ymm3
...

pittma · 2024-01-30T19:49:41Z

Looks like Linux CI runs are failing. Would someone mind sharing the details on those failures?


FAILED: crypto/fipsmodule/CMakeFiles/bcm_hashunset.dir/bcm-delocated.S.o
--
2390 | /usr/bin/clang-6.0 -DBORINGSSL_FIPS -DBORINGSSL_IMPLEMENTATION -DFIPS_ENTROPY_SOURCE_PASSIVE -Isymbol_prefix_include -I../include -Wa,--noexecstack -O3 -DNDEBUG -fPIC   -mavx512f -mavx512bw -mavx512dq -mavx512vl -MD -MT crypto/fipsmodule/CMakeFiles/bcm_hashunset.dir/bcm-delocated.S.o -MF crypto/fipsmodule/CMakeFiles/bcm_hashunset.dir/bcm-delocated.S.o.d -o crypto/fipsmodule/CMakeFiles/bcm_hashunset.dir/bcm-delocated.S.o -c crypto/fipsmodule/bcm-delocated.S
2391 | crypto/fipsmodule/bcm-delocated.S:1300482:2: error: instruction requires: AVX-512 IFMA ISA
2392 | vpmadd52luq 0(%rsi),%ymm1,%ymm3
...

Thanks! Looks like the IFMA flag isn't being passed in this command. Building and running tests seems okay on my SPR dev machine, though. Maybe I've missed a build conditional somewhere…

dkostic · 2024-02-01T16:01:58Z

Looks like Linux CI runs are failing. Would someone mind sharing the details on those failures?


FAILED: crypto/fipsmodule/CMakeFiles/bcm_hashunset.dir/bcm-delocated.S.o
--
2390 | /usr/bin/clang-6.0 -DBORINGSSL_FIPS -DBORINGSSL_IMPLEMENTATION -DFIPS_ENTROPY_SOURCE_PASSIVE -Isymbol_prefix_include -I../include -Wa,--noexecstack -O3 -DNDEBUG -fPIC   -mavx512f -mavx512bw -mavx512dq -mavx512vl -MD -MT crypto/fipsmodule/CMakeFiles/bcm_hashunset.dir/bcm-delocated.S.o -MF crypto/fipsmodule/CMakeFiles/bcm_hashunset.dir/bcm-delocated.S.o.d -o crypto/fipsmodule/CMakeFiles/bcm_hashunset.dir/bcm-delocated.S.o -c crypto/fipsmodule/bcm-delocated.S
2391 | crypto/fipsmodule/bcm-delocated.S:1300482:2: error: instruction requires: AVX-512 IFMA ISA
2392 | vpmadd52luq 0(%rsi),%ymm1,%ymm3
...

Thanks! Looks like the IFMA flag isn't being passed in this command. Building and running tests seems okay on my SPR dev machine, though. Maybe I've missed a build conditional somewhere…

I think you need to add it here as well: https://github.com/aws/aws-lc/blob/main/crypto/fipsmodule/CMakeLists.txt#L367

The fipstool delocation only allows the use of `lea` when interacting with this symbol. This commit uses `lea` and `r11` as required by the delocation process.

dkostic · 2024-08-09T23:14:37Z

.github/workflows/mingw.yml

@@ -0,0 +1,42 @@
+name: MinGW


why do we need this? Is this not covered by our existing intel SDE tests?

I'm not sure how this file got into this PR. This appears to be a duplicate of CI tests we already have: https://github.com/aws/aws-lc/blob/main/.github/workflows/windows-alt.yml#L11

It must have come in with an intermediate merge somewhere along the way. I will remove it.

pittma · 2024-08-21T21:42:51Z

I expect to get back to this next week—chaos reigns over here at the moment.

pittma · 2024-09-04T20:12:26Z

I've made it through @nebeid and @dkostic's last reviews and I'll start the testing / merging process here in a bit. I should have something pushed up in the next day or so.

That leaves:

ABI test
Dispatch test

Which I will get started on as soon as I get the current state of the patch in order.

As for follow-ups:

I think there may be some opportunity for simplification / lucidity improvements, particularly in rsaz_exp_x2.c, but those will of course need to be tested for performance regressions.
Unrolled single multiplication vs. dual multiplication.

nebeid · 2024-09-05T21:35:57Z

crypto/fipsmodule/bn/rsaz_exp_x2.c

+    //   - We have AMM(t, 2^k) = R^4 * 2^{4*(s-n)} / R'^2 mod m
+    //                         = R'^4 / R'^2 mod m
+    //                         = R'^2 mod m
+


Is there a reason the example wasn't added back? It was in the original commit, I just reworded it. I think it was a helpful illustration.

nebeid · 2024-09-05T21:52:21Z

crypto/fipsmodule/bn/rsaz_exp_x2.c

-     * Number of word-size (uint64_t) digits to store in redundant
-     * representation.
-     */
+    // Number of word-size (uint64_t) digits to store in redundant


Suggested change

// Number of word-size (uint64_t) digits to store in redundant

// Number of word-size (uint64_t) digits to store values in redundant

nebeid · 2024-09-05T21:55:58Z

crypto/fipsmodule/bn/rsaz_exp_x2.c

+    amm(rr1_red, rr1_red, rr1_red, m1_red, k0_1);
+    amm(rr1_red, rr1_red, coeff_red, m1_red, k0_1);


Suggested change

amm(rr1_red, rr1_red, rr1_red, m1_red, k0_1);

amm(rr1_red, rr1_red, coeff_red, m1_red, k0_1);

amm(rr1_red, rr1_red, rr1_red, m1_red, k0_1); // (1) for m1

amm(rr1_red, rr1_red, coeff_red, m1_red, k0_1); // (2) for m1

nebeid · 2024-09-05T21:57:11Z

crypto/fipsmodule/bn/rsaz_exp_x2.c

+    amm(rr2_red, rr2_red, rr2_red, m2_red, k0_2);
+    amm(rr2_red, rr2_red, coeff_red, m2_red, k0_2);


Suggested change

amm(rr2_red, rr2_red, rr2_red, m2_red, k0_2);

amm(rr2_red, rr2_red, coeff_red, m2_red, k0_2);

amm(rr2_red, rr2_red, rr2_red, m2_red, k0_2); // (1) for m2

amm(rr2_red, rr2_red, coeff_red, m2_red, k0_2); // (2) for m2

nebeid · 2024-09-05T22:01:54Z

crypto/fipsmodule/bn/rsaz_exp_x2.c

@@ -120,11 +121,11 @@ int RSAZ_mod_exp_avx512_x2(uint64_t *res1,
    uint64_t *storage = NULL;
    uint64_t *storage_aligned = NULL;
    int storage_len_bytes = 7 * regs_capacity * sizeof(uint64_t)
-                           + 64 /* alignment */;
+                           + 64;


Suggested change

+ 64;

+ 64; // alignment

the added 64 is for alignment, right?

nebeid · 2024-09-05T22:08:41Z

crypto/fipsmodule/bn/rsaz_exp_x2.c

+	// `rem` is { 1024, 1536, 2048 } % 5 which is { 4, 1, 3 }
+        // respectively.
+        //
+        // If this assertion ever fails the fix above is easy.


Suggested change

// If this assertion ever fails the fix above is easy.

// If this assertion ever fails then we should set this easy fix

// exp_bit_no = modlen - exp_win_size

Is that what's intended? Because the change removed "the fix above".

nebeid · 2024-09-05T22:10:58Z

crypto/fipsmodule/bn/rsaz_exp_x2.c

-                     * Get additional bits from then next quadword
-                     * when 64-bit boundaries are crossed.
-                     */
+                    red_table_idx_1 = expz[exp_chunk_no + 0 * (exp_digits + 1)];


Suggested change

red_table_idx_1 = expz[exp_chunk_no + 0 * (exp_digits + 1)];

red_table_idx_1 = expz[EXP_CHUNK(0)];

nebeid · 2024-09-05T22:14:10Z

crypto/fipsmodule/bn/rsaz_exp_x2.c

    {
-        const int rem = modulus_bitsize % exp_win_size;
-        const BN_ULONG table_idx_mask = exp_win_mask;
+        const int rem = modlen % exp_win_size;


(if it's correct)

Suggested change

const int rem = modlen % exp_win_size;

// Find the location of the 5-bit window in the exponent which is stored

// in 64-bit digits. Left pad it with 0s to form a 64-bit digit to become

// an index in the precomputed table.

// The window location in the exponent is identified by its least

// significant bit `exp_bit_no`.

const int rem = modlen % exp_win_size;

nebeid · 2024-09-05T22:17:04Z

crypto/fipsmodule/bn/rsaz_exp_x2.c

                }
                {
-                    red_table_idx_1 = expz[exp_chunk_no + 1 * (exp_digits + 1)];
-                    T = expz[exp_chunk_no + 1 + 1 * (exp_digits + 1)];
+                    red_table_idx_2 = expz[exp_chunk_no + 1 * (exp_digits + 1)];


Suggested change

red_table_idx_2 = expz[exp_chunk_no + 1 * (exp_digits + 1)];

red_table_idx_2 = expz[EXP_CHUNK(1)];

nebeid · 2024-09-05T22:22:19Z

crypto/fipsmodule/bn/rsaz_exp_x2.c

            }

-            /* Series of squaring */
+            // Series of squaring


#1273 (comment)

nebeid · 2024-09-06T15:41:00Z

crypto/impl_dispatch_test.cc

+	uint64_t k0_2 = 0;
+        int modlen = 0;
+
+        RSAZ_mod_exp_avx512_x2(&res1, &base1, &exp1, &m1, &rr1, k0_1,


I think we should be calling BN_mod_exp_mont_consttime_x2 in exponentiation.c to make sure this function gets called. It may be why the test is failing as follows: the function is called where the conditions in flag.second are false.

[ RUN ] ImplDispatchTest.BN_mod_exp_mont_consttime_x2 ../crypto/impl_dispatch_test.cc:105: Failure Expected equality of these values: flag.second Which is: false BORINGSSL_function_hit[flag.first] == 1 Which is: true Google Test trace: ../crypto/impl_dispatch_test.cc:103: 8

Of course you're right. I'm not sure what I was thinking here.

I even named the test the right thing! But got mixed up while filling it in. Anyways, 92b9e3f fixes this, and with that I think we're caught up.

nebeid · 2024-09-10T14:42:08Z

crypto/impl_dispatch_test.cc

+        BN_CTX_end(ctx);
+        BN_MONT_CTX_free(mont1);
+        BN_MONT_CTX_free(mont2);


Suggested change

BN_CTX_end(ctx);

BN_MONT_CTX_free(mont1);

BN_MONT_CTX_free(mont2);

BN_MONT_CTX_free(mont1);

BN_MONT_CTX_free(mont2);

BN_CTX_end(ctx);

BN_CTX_free(ctx);

dkostic · 2024-09-16T18:39:05Z

tests/ci/cdk/cdk/codebuild/github_ci_android_omnibus.yaml

@@ -47,5 +47,5 @@ batch:
      env:
        type: LINUX_CONTAINER
        privileged-mode: true
-        compute-type: BUILD_GENERAL1_MEDIUM
+        compute-type: BUILD_GENERAL1_LARGE


I guess this is a leftover of a merge with main?

@nebeid actually pushed this change in 506dced. I can't see the Android CI runs, I think, but when she pushed this it started passing.

That's right. An advice from the team was to increase the capacity for that particular Android cross build which wasn't passing.

pittma · 2024-09-16T22:13:09Z

Pushing a merge commit "dismissed" your approval @nebeid, that was not my intention!

@andrewhop

## What's Changed * Use OPENSSL_STATIC_ASSERT which handles all the platform/compiler/C s… by @andrewhop in #1791 * ML-KEM refactor by @dkostic in #1763 * ML-KEM-IPD to ML-KEM as defined in FIPS 203 by @dkostic in #1796 * Add KDA OneStep testing to ACVP by @skmcgrail in #1792 * Updating erroneous documentation for BIO_get_mem_data and subsequent usage by @smittals2 in #1752 * No-op impls for several EVP_PKEY_CTX functions by @justsmth in #1759 * Drop "ipd" suffix from ML-KEM related code by @dkostic in #1797 * Upstream merge 2024 08 19 by @skmcgrail in #1781 * ML-KEM move to the FIPS module by @dkostic in #1802 * Reduce collision probability for variable names by @torben-hansen in #1804 * Refactor ENGINE API and memory around METHOD structs by @smittals2 in #1776 * bn: Move x86-64 argument-based dispatching of bn_mul_mont to C. by @justsmth in #1795 * Check at runtime that the tool is loading the same libcrypto it was built with by @andrewhop in #1716 * Avoid matching prefixes of a symbol as arm registers by @torben-hansen in #1807 * Add CI for FreeBSD by @justsmth in #1787 * Move curve25519 implementations to fips module except spake25519 by @torben-hansen in #1809 * Add CAST for SP 800-56Cr2 One-Step function by @skmcgrail in #1803 * Remove custom PKCS7 ASN1 functions, add new structs by @WillChilds-Klein in #1726 * NASM use default debug format by @justsmth in #1747 * Add KDF in counter mode ACVP Testing by @skmcgrail in #1810 * add support for OCSP_request_verify by @samuel40791765 in #1778 * Fix GitHub/CodeBuild Purge Lambda by @justsmth in #1808 * KBKDF_ctr_hmac FIPS Service Indicator by @skmcgrail in #1798 * Update x509 tool to write all output to common BIO which is a file or stdout by @andrewhop in #1800 * Add ML-KEM to speed.cc, bump AWSLC_API_VERSION to 30 by @andrewhop in #1817 * Add EVP_PKEY_asn1_* functions by @justsmth in #1751 * Improve portability of CI integration script by @torben-hansen in #1815 * Upstream merge 2024 08 23 by @justsmth in #1799 * Replace ECDSA_METHOD with EC_KEY_METHOD and add the associated API by @smittals2 in #1785 * Cherrypick "Add some barebones support for DH in EVP" by @samuel40791765 in #1813 * Add KDA OneStep (SSKDF_digest and SSKDF_hmac) to FIPS indicator by @skmcgrail in #1793 * Add EVP_Digest one-shot test XOFs by @WillChilds-Klein in #1820 * Wire-up ACVP Testing for SHA3 Signatures with RSA by @skmcgrail in #1805 * Make SHA3 (not SHAKE) Approved for EVP_DigestSign/Verify, RSA and ECDSA. by @nebeid in #1821 * Begin tracking RelWithDebInfo library statistics by @andrewhop in #1822 * Move EVP ed25519 function table under FIPS module by @torben-hansen in #1826 * Avoid C11 Atomics on Windows by @justsmth in #1824 * Improve pre-sandbox setup by @torben-hansen in #1825 * Add OCSP round trip integration test with minor fixes by @samuel40791765 in #1811 * Add various PKCS7 getters and setters by @WillChilds-Klein in #1780 * Run clang-format on pkcs7 code by @WillChilds-Klein in #1830 * Move KEM API and ML-KEM definitions to FIPS module by @torben-hansen in #1828 * fix socat integration CI by @samuel40791765 in #1833 * Retire out-of-module KEM folder by @torben-hansen in #1832 * Refactor RSA_METHOD and expand API by @smittals2 in #1790 * Update benchmark documentation in tool/readme.md by @andrewhop in #1812 * Pre jail unit test by @torben-hansen in #1835 * Move EVP KEM implementation to in-module and correct OID by @torben-hansen in #1838 * More minor symbols Ruby depends on by @samuel40791765 in #1837 * ED25519 Power-on Self Test / CAST / KAT by @skmcgrail in #1834 * ACVP ML-KEM testing by @skmcgrail in #1840 * ACVP ECDSA SHA3 Digest Testing by @skmcgrail in #1819 * ML-KEM Service Indicator for EVP_PKEY_keygen, EVP_PKEY_encapsulate, EVP_PKEY_decapsulate by @skmcgrail in #1844 * Add ML-KEM CAST for KeyGen, Encaps, and Decaps by @skmcgrail in #1846 * ED25519 Service Indicator by @skmcgrail in #1829 * Update Allowed RSA KeySize Generation to FIPS 186-5 specification by @skmcgrail in #1823 * Add ED25519 ACVP Testing by @skmcgrail in #1818 * Make EDDSA/Ed25519 POST lazy initalized by @skmcgrail in #1848 * add support for PEM Parameters without ASN1 hooks by @samuel40791765 in #1831 * Add OpenVPN tip of main to CI by @smittals2 in #1843 * Ensure SSE2 is enabled when using optimized assembly for 32-bit x86 by @graebm in #1841 * Add support for `EVP_PKEY_CTX_ctrl_str` - Step #1 by @justsmth in #1842 * Added SHA3/SHAKE XOF functionality by @jakemas in #1839 * Migrated ML-KEM SHA3/SHAKE usage to fipsmodule by @jakemas in #1851 * AVX-512 support for RSA Signing by @pittma in #1273

pittma requested a review from a team as a code owner October 30, 2023 18:35

skmcgrail requested review from dkostic and nebeid October 31, 2023 17:18

pittma force-pushed the main branch 2 times, most recently from 1ceedcf to 676b064 Compare November 2, 2023 22:01

dkostic reviewed Nov 3, 2023

View reviewed changes

crypto/fipsmodule/CMakeLists.txt Outdated Show resolved Hide resolved

crypto/fipsmodule/CMakeLists.txt Outdated Show resolved Hide resolved

crypto/fipsmodule/CMakeLists.txt Outdated Show resolved Hide resolved

pittma force-pushed the main branch from 676b064 to 057b159 Compare November 3, 2023 18:22

dkostic reviewed Nov 7, 2023

View reviewed changes

crypto/fipsmodule/CMakeLists.txt Outdated Show resolved Hide resolved

crypto/fipsmodule/CMakeLists.txt Outdated Show resolved Hide resolved

crypto/fipsmodule/CMakeLists.txt Outdated Show resolved Hide resolved

pittma force-pushed the main branch 2 times, most recently from f5236af to 91881d5 Compare November 15, 2023 19:56

nebeid reviewed Dec 5, 2023

View reviewed changes

pittma force-pushed the main branch 3 times, most recently from 7ede753 to e3ca1f5 Compare January 30, 2024 00:11

torben-hansen added the reviewers-assigned label Jan 30, 2024

pittma force-pushed the main branch from e3ca1f5 to af9ccfb Compare February 1, 2024 19:10

pittma added 7 commits February 1, 2024 11:11

Use IFMA_AVX512 when possible for modular exponentiation.

b9088fc

Add test coverage for consttime_x2 mod exp function

e6269ff

Add fuzzer coverage for BN_mod_exp_mont_consttime_x2

6d2ece9

prevent empty translation units for compilers that don't like them

e0ad9da

properly handle AVX-512 build conditions

024a9ec

fips builds require subsections

cd2a3d1

fix disallowed interaction with OPENSSL_ia32_cap_P in fips mode

d4d89fc

The fipstool delocation only allows the use of `lea` when interacting with this symbol. This commit uses `lea` and `r11` as required by the delocation process.

dkostic reviewed Aug 9, 2024

View reviewed changes

pittma added 3 commits September 5, 2024 09:57

Merge branch 'main' of github.com:aws/aws-lc into pmain

37b4a4a

further review comments

e06d8d0

add ABI tests for new RSA AVX-512 assmebly routines

bf9fc29

nebeid reviewed Sep 5, 2024

View reviewed changes

add dispatch tests for AVX-512 enabled RSA signing

e626c2c

nebeid reviewed Sep 6, 2024

View reviewed changes

pittma added 5 commits September 6, 2024 12:09

fix dispatch test

92b9e3f

Merge remote-tracking branch 'origin/main'

1055b42

Merge branch 'main' of github.com:aws/aws-lc

58af762

fix conditional build logic in dispatch test

56d8fd6

generated asm should properly exclude when using old assembler

f925e7c

nebeid reviewed Sep 10, 2024

View reviewed changes

pittma and others added 5 commits September 10, 2024 09:51

Merge branch 'main' of github.com:aws/aws-lc

2473469

in ninja-based build, old assembler logic is already handled

ef26ced

Merge branch 'main' of github.com:aws/aws-lc

73b7b8f

Increasing the capacity of ubuntu2004_android_fips_static_release.

506dced

Merge branch 'main' into main

0dd53a1

nebeid previously approved these changes Sep 12, 2024

View reviewed changes

dkostic reviewed Sep 16, 2024

View reviewed changes

Merge branch 'main' of github.com:aws/aws-lc

f3715bb

pittma dismissed nebeid’s stale review via f3715bb September 16, 2024 22:09

nebeid approved these changes Sep 17, 2024

View reviewed changes

dkostic approved these changes Sep 17, 2024

View reviewed changes

nebeid merged commit e22cf50 into aws:main Sep 17, 2024
108 of 110 checks passed

smittals2 mentioned this pull request Sep 17, 2024

prepare for v1.35.0 release #1853

Merged

	// Number of word-size (uint64_t) digits to store in redundant
	// Number of word-size (uint64_t) digits to store values in redundant

		amm(rr1_red, rr1_red, rr1_red, m1_red, k0_1);
		amm(rr1_red, rr1_red, coeff_red, m1_red, k0_1);

		amm(rr2_red, rr2_red, rr2_red, m2_red, k0_2);
		amm(rr2_red, rr2_red, coeff_red, m2_red, k0_2);

	// If this assertion ever fails the fix above is easy.
	// If this assertion ever fails then we should set this easy fix
	// exp_bit_no = modlen - exp_win_size

	red_table_idx_1 = expz[exp_chunk_no + 0 * (exp_digits + 1)];
	red_table_idx_1 = expz[EXP_CHUNK(0)];

-        const int rem = modlen % exp_win_size;
+        // Find the location of the 5-bit window in the exponent which is stored
+        // in 64-bit digits. Left pad it with 0s to form a 64-bit digit to become
+        // an index in the precomputed table.
+        // The window location in the exponent is identified by its least
+        // significant bit `exp_bit_no`.
+        const int rem = modlen % exp_win_size;

	red_table_idx_2 = expz[exp_chunk_no + 1 * (exp_digits + 1)];
	red_table_idx_2 = expz[EXP_CHUNK(1)];

AVX-512 support for RSA Signing #1273

AVX-512 support for RSA Signing #1273

Conversation

pittma commented Oct 30, 2023 • edited Loading

Description of changes:

Call-outs:

Testing:

pittma commented Nov 3, 2023

codecov-commenter commented Nov 16, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nebeid Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pittma commented Jan 30, 2024

dkostic commented Jan 30, 2024

pittma commented Jan 30, 2024

dkostic commented Feb 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pittma commented Aug 21, 2024

pittma commented Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pittma commented Sep 16, 2024

pittma commented Oct 30, 2023 •

edited

Loading

codecov-commenter commented Nov 16, 2023 •

edited

Loading

nebeid Dec 5, 2023 •

edited

Loading

pittma commented Sep 4, 2024 •

edited

Loading