Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EC_Scalar and EC_AffinePoint types #4042

Merged
merged 10 commits into from
Jul 10, 2024
Merged

Add EC_Scalar and EC_AffinePoint types #4042

merged 10 commits into from
Jul 10, 2024

Conversation

randombit
Copy link
Owner

@randombit randombit commented May 5, 2024

This is the other side of #3979 where we push the interface that's exposed to the rest of the library to a non-BigInt oriented model.

#4027

@randombit randombit requested a review from reneme May 5, 2024 19:28
@randombit randombit changed the title Jack/new ec types Add EC_Scalar and EC_AffinePoint types May 5, 2024
@randombit randombit force-pushed the jack/new-ec-types branch 2 times, most recently from 03b58cd to bae2ce8 Compare May 5, 2024 19:54
@coveralls
Copy link

coveralls commented May 5, 2024

Coverage Status

coverage: 91.693% (-0.03%) from 91.726%
when pulling 8042e9d on jack/new-ec-types
into 7fad1d2 on master.

@randombit randombit force-pushed the jack/new-ec-types branch 4 times, most recently from a70db6d to b09a762 Compare May 6, 2024 11:55
@randombit randombit force-pushed the jack/new-ec-types branch 6 times, most recently from 4db015d to b594b6b Compare May 19, 2024 11:21
@randombit randombit added this to the Botan 3.5.0 milestone May 22, 2024
@randombit randombit force-pushed the jack/new-ec-types branch 3 times, most recently from d5fa5d9 to 94c81d6 Compare May 29, 2024 10:17
src/lib/pubkey/ec_group/ec_apoint.h Outdated Show resolved Hide resolved
src/lib/pubkey/ec_group/ec_group.h Outdated Show resolved Hide resolved
src/lib/pubkey/ec_group/ec_inner_data.h Outdated Show resolved Hide resolved
src/lib/pubkey/ec_group/ec_scalar.h Outdated Show resolved Hide resolved
@randombit randombit force-pushed the jack/new-ec-types branch 4 times, most recently from 9d2a5c2 to a66bee9 Compare June 11, 2024 09:38
@reneme
Copy link
Collaborator

reneme commented Jun 11, 2024

I will go through this in the coming days. Now with the underlying PR merged. 👍

Copy link
Collaborator

@reneme reneme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first review pass. Looks really good! Especially the simplified algorithm implementations!

Left some nits and suggestions. There's one finding that might point to a missing test, though.

src/lib/pubkey/ec_group/ec_inner_data.h Outdated Show resolved Hide resolved

BufferStuffer stuffer(bytes);
stuffer.append(y_is_odd ? 0x03 : 0x02);
stuffer.append(std::span{m_xy}.last(fe_bytes));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compressed encoding is oddness-header + x-value, no? Surprised that it didn't fail any tests.

Suggested change
stuffer.append(std::span{m_xy}.last(fe_bytes));
stuffer.append(std::span{m_xy}.first(fe_bytes));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also: #4042 (comment)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL WTF that definitely should have caused test failures, need to look at this

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK the reason this was missed is that currently this serialization logic in unused, because keys/etc are still using EC_Point.

We're in for a somewhat awkward situation in the short term as we'll have 3 distinct elliptic curve point types - original flavor EC_Point, this EC_AffinePoint, plus the (internal only) pcurves AffinePoint, with a lot of redundant/duplicated logic.

For 3.6.0 onward the situation improves

  • The various pubkey algorithms switch to storing their points as EC_AffinePoint. At that point EC_Point becomes unused except for testing; unfortunately we can't remove it completely until Botan 4. Maybe we'll be able to segregate it into a deprecated submodule of ec_group though.

  • EC_AffinePoint gains a bridge to pcurve. This affords removing some redundancies between pcurves and what we actually need in practice. For example pcurves has scalar_from_bits_with_trunc but this is just modular reduction plus some shifts. It'll likely be simpler, once the bridge exists, to remove that and do the shifting in some higher point in the call stack so the logic is shared between the pcurves and BigInt based approaches.

src/lib/pubkey/ec_group/ec_inner_data.h Outdated Show resolved Hide resolved
src/lib/pubkey/ec_group/ec_inner_data.h Outdated Show resolved Hide resolved
src/lib/pubkey/ec_group/ec_inner_data.h Outdated Show resolved Hide resolved
src/lib/pubkey/gost_3410/gost_3410.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/gost_3410/gost_3410.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/gost_3410/gost_3410.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/rfc6979/rfc6979.cpp Outdated Show resolved Hide resolved
Comment on lines 29 to 33
const BigInt& RFC6979_Nonce_Generator::nonce_for(const BigInt& m) {
m.serialize_to(std::span{m_rng_in}.subspan(m_rlen));
std::vector<uint8_t> m_bytes(m_rlen);
m.serialize_to(m_bytes);
return this->nonce_for(m_bytes);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure its worth omitting the extra allocation, but you could serialize straight into m_rng_in and then establish an internal method that just assumes m_rng_in is properly set up. Similarly, for the std::span<> overload, that could just copy_mem into m_rng_in and then also call this private method.

Like so:

m.serialize_to(std::span{m_rng_in}.subspan(m_rlen));
return this->generate_nonce();

While we're on it: it might be worthwhile to keep the span "std::span{m_rng_in}.subspan(m_rlen)" as a member variable to avoid misuse (by malforming the subspan() invocation) here and in the other overload

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I think the original version of this was written before we finalized on the write-to-span idiom for BigInt.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually looked at this more and it doesn't seem worth bothering with.

There are two uses of RFC 6979 right now, ECDSA and DSA.

  • ECDSA uses the bytes variant (or will, with this PR merged)
  • DSA actually sets up a complete new RFC6979_Nonce_Generator object for each signature! Saving one extra allocation is the least of our worries 🙈

Any future users of RFC 6979 (eg ECGDSA or etc) will use the bytes interface, so this second overload of nonce_for really is (will become) DSA specific at this point.

I think this can be cleaned up further - in particular in the long run I'd like to avoid the multiple conversions between BigInt and EC_Scalar implied by

   const auto k = EC_Scalar::from_bigint(m_group, m_rfc6979->nonce_for(m_bytes: m.serialize()));

It's interesting that as code becomes faster, tiny optimizations become more meaningful. We're already under 250K cycles for ECDSA signatures for secp256r1 (for pcurves). At this point saving even a thousand cycles can be noticeable in the overall throughput.

I left a todo regarding this in #4027

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was somewhat surprised in #4143 that ECDSA once bridged was slower than our "demo" ECDSA directly using pcurves. I initially ascribed it to overhead of the abstractions but it turns out almost all of the overhead is RFC 6979. Almost 50K cycles on my laptop. I think, while implementing this in terms of HMAC_DRBG was convenient, it's not very fast...

@randombit randombit force-pushed the jack/new-ec-types branch 2 times, most recently from 2450e74 to 5f46292 Compare June 15, 2024 13:13
@randombit
Copy link
Owner Author

BTW I should mention in terms of reviewing, the overall structure behind the new types isn't that important since it's going to change quite a bit to handle the possibility of different backend implementations. That part is literally just the first thing that came to mind that works. Whereas EC_Scalar and EC_AffinePoints interfaces, and how they are used to implement the various schemes, are IMO "final" and so should deserve more scrutiny.

@randombit randombit force-pushed the jack/new-ec-types branch 5 times, most recently from 08980de to eb4f1bd Compare June 21, 2024 14:36
@randombit
Copy link
Owner Author

OK with eb4f1bd in place, previous comment can be disregarded. I think this is more or less the "final" [*] approach for implementing the new scalar and point types. You can see in #4143 what the bridge looks like for pcurves.

[*] Not ideal in any way, I'd like there to be fewer allocations, dynamic_cast, etc in here - but it works and doesn't seem to have that much overhead in practice. I expect we'll iterate on this in the future. A lot of what I'd like to do isn't possible until we've eliminated the real weirdo curves and all the deprecated interfaces, EC_Point, etc. [**]

[**] Not technically true - we could get away with a third internal EC implementation - classic BigInt approach, pcurves, plus a third generic curve but bounded size impl. But it just doesn't seem worth it from a complexity perspective. I expect that in Botan 4, with the restrictions on curves in place, we'll transition away from BigInt entirely within EC.

@randombit randombit modified the milestones: Botan 3.5.0, Botan 3.6.0 Jun 25, 2024
@randombit
Copy link
Owner Author

Deferring to 3.6 - it's better that all of this cook in master for a few months before we commit to anything wrt public API

src/lib/pubkey/rfc6979/rfc6979.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/rfc6979/rfc6979.cpp Outdated Show resolved Hide resolved
Comment on lines 64 to 69
uint8_t carry = 0;
for(size_t i = 0; i != m_rng_out.size(); ++i) {
const uint8_t w = m_rng_out[i];
m_rng_out[i] = (w >> shift) | carry;
carry = w << (8 - shift);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all sure this is any worth it, performance wise. But we could do the shifting on 64-bit words (for instance) as long as possible (i'm assuming that m_rlen isn't necessarily divisible by 8.

Suggested change
uint8_t carry = 0;
for(size_t i = 0; i != m_rng_out.size(); ++i) {
const uint8_t w = m_rng_out[i];
m_rng_out[i] = (w >> shift) | carry;
carry = w << (8 - shift);
}
uint8_t carry = 0;
BufferSlicer slicer(m_rng_out);
BufferStuffer stuffer(m_rng_out);
while(slicer.remaining() >= 8) {
const auto w = load_be(slicer.take<8>());
stuffer.append(store_be((w >> shift) | carry));
carry = w << (64 - shift);
}
while(!slicer.empty()) {
const uint8_t w = slicer.take_byte();
stuffer.append((w >> shift) | carry);
carry = w << (8 - shift);
}

The BufferTransformer proposed in #3942 could slim this down.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In more general terms: this tiered transformation could even be a feature of the BufferTransformer, where one passes a lambda and the BufferTransformer takes care of the load/stores and the optimal strides. Like so:

BufferTransformer bt(m_rng_out);
bt.transform([carry = 0, shift](std::unsigned_integral auto i) mutable -> decltype(i) {
   const auto r = (i >> shift) | carry;
   carry = i << (sizeof(i) * 8 - shift);
   return r;
});

... just thinking out loud, I'm aware that this might be a vast over-engineering. However, I believe, in #3883 I could use this for the bitvector<> implementation that needs something similar for its bit operations.

Also, e.g. for block ciphers BT::transform could be overloaded with a statically-sized span to factor out the loop and buffer handling.

Copy link
Owner Author

@randombit randombit Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this specific case, this shift only occurs for curves where the group order is not a multiple of 8, which consists of a set of weird curves (P239, etc almost all of which are deprecated already) plus P-521. So here I don’t think it matters much, especially when HMAC_DRBG is so expensive.

However I do think this general concept is worth exploring. A related (but possibly dissimilar enough that it’s not worth attempting to handle in the same abstraction) issue is in the mp code where we have explict unrolling for 8 and sometimes 4 words for various algorithms followed by a loop that handles the tail one word at a time. If we did this systematically for 16,8,4,2,1 word increments, that would likely lead to some nice performance improvements.

Copy link
Collaborator

@reneme reneme Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record: This draft of BufferTransformer actually demonstates how we could systematically do the tiered block processing. Along those lines:

BufferTransformer bt(m_rng_out);
uint8_t carry = 0;
bt.process_blocks_of<8, 1>([&]<size_t s>(std::span<const uint8_t, s> in, std::span<uint8_t, s> out) {
   const auto i = load_be(in);
   using block_type = decltype(i);
   const auto r = static_cast<block_type>((i >> shift) | static_cast<block_type>(carry));
   carry = i << (sizeof(block_type) * 8 - shift);
   store_be(out, r);
});

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

src/lib/pubkey/rfc6979/rfc6979.cpp Outdated Show resolved Hide resolved
@randombit
Copy link
Owner Author

@reneme Was not sure if you were planning further review of this

@reneme
Copy link
Collaborator

reneme commented Jul 10, 2024

Was not sure if you were planning further review of this

I believe I wasn't fully done but no need to block the merge.

@randombit randombit force-pushed the jack/new-ec-types branch 2 times, most recently from f78faf5 to 2ada5e5 Compare July 10, 2024 10:51
@randombit
Copy link
Owner Author

OK I'm going to go ahead and merge. Feel free to do a retro-review anytime, we have plenty of time between now and 3.6

randombit and others added 10 commits July 10, 2024 08:38
These have a much more restrictive interface, as compared to our
existing EC_Point and BigInt

Co-authored-by: René Meusel <[email protected]>
Co-authored-by: René Meusel <[email protected]>
Co-authored-by: René Meusel <[email protected]>
This will allow switching in pcurves later
@randombit randombit force-pushed the jack/new-ec-types branch from 2ada5e5 to 8042e9d Compare July 10, 2024 12:40
@randombit randombit merged commit 45be74e into master Jul 10, 2024
39 checks passed
@randombit randombit deleted the jack/new-ec-types branch July 10, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants