Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX-512 support for RSA Signing #1273

Merged
merged 36 commits into from
Sep 17, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
b9088fc
Use IFMA_AVX512 when possible for modular exponentiation.
pittma Aug 7, 2023
e6269ff
Add test coverage for consttime_x2 mod exp function
pittma Oct 23, 2023
6d2ece9
Add fuzzer coverage for BN_mod_exp_mont_consttime_x2
pittma Oct 23, 2023
e0ad9da
prevent empty translation units for compilers that don't like them
pittma Oct 30, 2023
024a9ec
properly handle AVX-512 build conditions
pittma Oct 31, 2023
cd2a3d1
fips builds require subsections
pittma Oct 31, 2023
d4d89fc
fix disallowed interaction with `OPENSSL_ia32_cap_P` in fips mode
pittma Nov 2, 2023
a0f3737
reset sections when they change for variable declaration
pittma Nov 2, 2023
8e55af5
include avx512ifma flag
pittma Nov 3, 2023
7d1ea20
handle AVX-512 mask register usage in fips delocation process
pittma Nov 15, 2023
407df8d
address review comments
pittma Jan 30, 2024
e67bbda
regen generated source
pittma Feb 1, 2024
b33709e
regenerate delocate parser
pittma Feb 1, 2024
0e7c607
AVX-512 RSA Signing: address first PR review
pittma Apr 10, 2024
b2d1327
Merge remote-tracking branch 'origin/main'
pittma Apr 10, 2024
14fefe0
Still export the parallel mod_exp implementation
pittma Apr 12, 2024
5e1c7ee
second set of review comments and documentation
pittma Apr 24, 2024
73d389d
fix generated source conflict
pittma Apr 24, 2024
087bf5c
Merge branch 'main' of github.com:aws/aws-lc into pmain
pittma Jul 25, 2024
c439bf0
address review 3 comments
pittma Jul 25, 2024
abe1124
Merge branch 'main' of github.com:aws/aws-lc
pittma Aug 7, 2024
37b4a4a
Merge branch 'main' of github.com:aws/aws-lc into pmain
pittma Sep 5, 2024
e06d8d0
further review comments
pittma Sep 4, 2024
bf9fc29
add ABI tests for new RSA AVX-512 assmebly routines
pittma Sep 5, 2024
e626c2c
add dispatch tests for AVX-512 enabled RSA signing
pittma Sep 5, 2024
92b9e3f
fix dispatch test
pittma Sep 6, 2024
1055b42
Merge remote-tracking branch 'origin/main'
pittma Sep 6, 2024
58af762
Merge branch 'main' of github.com:aws/aws-lc
pittma Sep 9, 2024
56d8fd6
fix conditional build logic in dispatch test
pittma Sep 9, 2024
f925e7c
generated asm should properly exclude when using old assembler
pittma Sep 9, 2024
2473469
Merge branch 'main' of github.com:aws/aws-lc
pittma Sep 10, 2024
ef26ced
in ninja-based build, old assembler logic is already handled
pittma Sep 10, 2024
73b7b8f
Merge branch 'main' of github.com:aws/aws-lc
pittma Sep 10, 2024
506dced
Increasing the capacity of ubuntu2004_android_fips_static_release.
nebeid Sep 11, 2024
0dd53a1
Merge branch 'main' into main
nebeid Sep 11, 2024
f3715bb
Merge branch 'main' of github.com:aws/aws-lc
pittma Sep 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 0 additions & 42 deletions .github/workflows/mingw.yml

This file was deleted.

4 changes: 2 additions & 2 deletions crypto/fipsmodule/bn/asm/rsaz-2k-avx512.pl
Original file line number Diff line number Diff line change
Expand Up @@ -482,8 +482,8 @@ sub amm52x20_x1_norm {

###############################################################################
# void extract_multiplier_2x20_win5(BN_ULONG *red_Y,
# const BN_ULONG red_table[1 << EXP_WIN_SIZE][2][20],
# int red_table_idx1, int red_table_idx2);
# const BN_ULONG red_table[1 << EXP_WIN_SIZE][2][20],
# int red_table_idx1, int red_table_idx2);
#
###############################################################################
{
Expand Down
5 changes: 2 additions & 3 deletions crypto/fipsmodule/bn/exponentiation.c
Original file line number Diff line number Diff line change
Expand Up @@ -1272,7 +1272,8 @@ int BN_mod_exp_mont_consttime(BIGNUM *rr, const BIGNUM *a, const BIGNUM *p,
// in_mont[i] - Montgomery multiplication context
// ctx - Bignum context.
//
// The width of each base, exponent, and modulus must match.
// The width of each base, exponent, and modulus must match and the
// contexts are expected to be initialized.
int BN_mod_exp_mont_consttime_x2(BIGNUM *rr1, const BIGNUM *a1, const BIGNUM *p1,
const BIGNUM *m1, const BN_MONT_CTX *in_mont1,
BIGNUM *rr2, const BIGNUM *a2, const BIGNUM *p2,
Expand Down Expand Up @@ -1332,11 +1333,9 @@ int BN_mod_exp_mont_consttime_x2(BIGNUM *rr1, const BIGNUM *a1, const BIGNUM *p1

rr1->width = widthn;
rr1->neg = 0;
bn_set_minimal_width(rr1);

rr2->width = widthn;
rr2->neg = 0;
bn_set_minimal_width(rr2);
} else {
// rr1 = a1^p1 mod m1
ret = BN_mod_exp_mont_consttime(rr1, a1, p1, m1, ctx, in_mont1);
Expand Down
1 change: 0 additions & 1 deletion crypto/fipsmodule/bn/internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -840,7 +840,6 @@ void bn_little_endian_to_words(BN_ULONG *out, size_t out_len, const uint8_t *in,
// leading zeros.
void bn_words_to_little_endian(uint8_t *out, size_t out_len, const BN_ULONG *in, const size_t in_len);


#if defined(__cplusplus)
} // extern C
#endif
Expand Down
38 changes: 21 additions & 17 deletions crypto/fipsmodule/bn/rsaz_exp.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@ extern "C" {
// the high bit set (it is 1024 bits wide). |RR| and |k0| must be |RR| and |n0|,
// respectively, extracted from |m_norm|'s |BN_MONT_CTX|. |storage_words| is a
// temporary buffer that must be aligned to |MOD_EXP_CTIME_ALIGN| bytes.
void RSAZ_1024_mod_exp_avx2(uint64_t result[16], const uint64_t base_norm[16],
const uint64_t exponent[16],
const uint64_t m_norm[16], const uint64_t RR[16],
uint64_t k0,
uint64_t storage_words[MOD_EXP_CTIME_STORAGE_LEN]);
void RSAZ_1024_mod_exp_avx2(BN_ULONG result[16], const BN_ULONG base_norm[16],
const BN_ULONG exponent[16],
const BN_ULONG m_norm[16], const BN_ULONG RR[16],
BN_ULONG k0,
BN_ULONG storage_words[MOD_EXP_CTIME_STORAGE_LEN]);

OPENSSL_INLINE int rsaz_avx2_capable(void) {
return CRYPTO_is_AVX2_capable();
Expand All @@ -65,31 +65,31 @@ OPENSSL_INLINE int rsaz_avx2_preferred(void) {

// rsaz_1024_norm2red_avx2 converts |norm| from |BIGNUM| to RSAZ representation
// and writes the result to |red|.
void rsaz_1024_norm2red_avx2(uint64_t red[40], const uint64_t norm[16]);
void rsaz_1024_norm2red_avx2(BN_ULONG red[40], const BN_ULONG norm[16]);

// rsaz_1024_mul_avx2 computes |a| * |b| mod |n| and writes the result to |ret|.
// Inputs and outputs are in Montgomery form, using RSAZ's representation. |k|
// is -|n|^-1 mod 2^64 or |n0| from |BN_MONT_CTX|.
void rsaz_1024_mul_avx2(uint64_t ret[40], const uint64_t a[40],
const uint64_t b[40], const uint64_t n[40], uint64_t k);
void rsaz_1024_mul_avx2(BN_ULONG ret[40], const BN_ULONG a[40],
const BN_ULONG b[40], const BN_ULONG n[40], BN_ULONG k);

// rsaz_1024_mul_avx2 computes |a|^(2*|count|) mod |n| and writes the result to
// |ret|. Inputs and outputs are in Montgomery form, using RSAZ's
// representation. |k| is -|n|^-1 mod 2^64 or |n0| from |BN_MONT_CTX|.
void rsaz_1024_sqr_avx2(uint64_t ret[40], const uint64_t a[40],
const uint64_t n[40], uint64_t k, int count);
void rsaz_1024_sqr_avx2(BN_ULONG ret[40], const BN_ULONG a[40],
const BN_ULONG n[40], BN_ULONG k, int count);

// rsaz_1024_scatter5_avx2 stores |val| at index |i| of |tbl|. |i| must be
// positive and at most 31. It is treated as public. Note the table only uses 18
// |uint64_t|s per entry instead of 40. It packs two 29-bit limbs into each
// |uint64_t| and only stores 36 limbs rather than the padded 40.
void rsaz_1024_scatter5_avx2(uint64_t tbl[32 * 18], const uint64_t val[40],
// |BN_ULONG|s per entry instead of 40. It packs two 29-bit limbs into each
// |BN_ULONG| and only stores 36 limbs rather than the padded 40.
void rsaz_1024_scatter5_avx2(BN_ULONG tbl[32 * 18], const BN_ULONG val[40],
int i);

// rsaz_1024_gather5_avx2 loads index |i| of |tbl| and writes it to |val|. |i|
// must be positive and at most 31. It is treated as secret. |tbl| must be
// aligned to 32 bytes.
void rsaz_1024_gather5_avx2(uint64_t val[40], const uint64_t tbl[32 * 18],
void rsaz_1024_gather5_avx2(BN_ULONG val[40], const BN_ULONG tbl[32 * 18],
int i);

// rsaz_1024_red2norm_avx2 converts |red| from RSAZ to |BIGNUM| representation
Expand All @@ -98,7 +98,7 @@ void rsaz_1024_gather5_avx2(uint64_t val[40], const uint64_t tbl[32 * 18],
// WARNING: The result of this operation may not be fully reduced. |norm| may be
// the modulus instead of zero. This function should be followed by a call to
// |bn_reduce_once|.
void rsaz_1024_red2norm_avx2(uint64_t norm[16], const uint64_t red[40]);
void rsaz_1024_red2norm_avx2(BN_ULONG norm[16], const BN_ULONG red[40]);

#if !defined(MY_ASSEMBLER_IS_TOO_OLD_FOR_512AVX)
dkostic marked this conversation as resolved.
Show resolved Hide resolved
#define RSAZ_512_ENABLED
Expand Down Expand Up @@ -132,6 +132,10 @@ void rsaz_1024_red2norm_avx2(uint64_t norm[16], const uint64_t red[40]);
//
// \return 0 in case of failure,
// 1 in case of success.
//
// NB: This function does not do any checks on its arguments, its
// caller, `BN_mod_exp_mont_consttime_x2`, checks args. It should be
// the function used directly.
int RSAZ_mod_exp_avx512_x2(uint64_t *res1,
const uint64_t *base1,
const uint64_t *exponent1,
Expand Down Expand Up @@ -197,8 +201,8 @@ void rsaz_amm52x20_x2_ifma256(uint64_t *out, const uint64_t *a,
// base^i, where i = 0..2^EXP_WIN_SIZE-1
//
// The input |red_table| contains precomputations for two independent
// base values. |red_table_idx1| and |red_table_idx2| are
// corresponding power indexes.
// base values and two independent moduli. The precomputed powers of
// the base values are stored contiguously in the table.
//
// Extracted value (output) is 2 20 digit numbers in 2^52 radix.
//
Expand Down
Loading