CBMC: Proof of type safety and memory safety for toplevel MLKEM KeyGen (modulo NTT+SHA3) #370

hanno-becker · 2024-11-09T06:17:51Z

This PR brings together the work of the past weeks into a top-level type+memory safety proof of crypto_kem_keypair_derand() -- with only NTT and SHA3 axiomatized, but all other lower level functions specified and proved.

A few changes had to be made to the existing lower level proofs -- see the individual commit messages for details.

mkannwischer

Nice @hanno-becker. Congratulations to the first proof of a high-level function! Looks like things are coming together.

Some suggestions and questions.

mlkem/polyvec.h

mlkem/indcpa.c

mlkem/poly.h

mlkem/polyvec.h

mlkem/indcpa.c

Attempts to verify the toplevel MLKEM-INDCPA API have showed that the matrix-vector multiplication embedded in the keypair generation (and, likely, also the one in the encapsulation) are causing performance issues for CBMC verification. This commit hoists the matrix-vector multiplication loop, excluding the preceeding mulcache computation, into a helper function `matvec_mul`. This helper function is specified and proved independently. To make this work, the call to `poly_tomont()` in the matvec multiplication loop inside key generation needs to be hoisted out of the loop and replaced by a call to the new function `polyvec_tomont()`, which is also given the obvious spec+proof. Signed-off-by: Hanno Becker <[email protected]>

Previously, the spec for polyvec_basemul_acc_montgomery_cached() would mark the entire destination object around the target polynomial as tainted. This is too strong if the caller passes a polynomial which is part of a polynomial vector -- which is the case during matrix vector multiplication. This commit strengthens the assignment clause for polyvec_basemul_acc_montgomery_cached to mark only a poly-sized slice as tainted. Similarly, the specification of poly_tomont() is strengthened. Signed-off-by: Hanno Becker <[email protected]>

Our type+memory safety proof in CBMC does not require any specific knowledge of what the NTT does. We do, however, need a bound on the coefficients of the output polynomial, in order to prove type safety (non-overflow) when the caller subsequently adds to the output. The definition of the NTT and invNTT bounds are, likely temporarily, moved to poly.h, to avoid a cyclic dependency between headers. Signed-off-by: Hanno Becker <[email protected]>

Previously, the assign clauses in poly_cbd_eta{1,2} would see the entire object containing the target polynomial as tainted. This is too strong if the target polynoimal is part of a polynomial vector. This commit strengthens the specifications to only taint the poly-sized slice spanned by the target polynomial. Signed-off-by: Hanno Becker <[email protected]>

Previously, poly_getnoise_eta1_4x() was specified as requiring four output polynomials satisfying IS_FRESH(...). While intuitively we only want to express that the output polynomials are disjoint, the obligation generated for a caller is, in fact, much stronger: The target polynomials cannot belong to the same object. In particular, the current specification would not allow to pass in four consecutive slices of a polyvec, or say 2 consecutive polyvecs in the first and last two arguments each. Those limitations will prevent verification of indcpa_keypair_derand, which does indeed pass slices of polyvecs to the getnoise functions. To accommodate, this commit introduces multiple preconditions varying on in the amount of object-sharing that's permitted: For MLKEM_K=4, the output polynomials must be consecutive slices of the same object. For MLKEM_K=2, the first and last two must be consecutive, but belong to separate objects. For MLKEM_K=3, the first three must be consecutive, and the forth is unused and must be NULL. For MLKEM_K=3, which needs sampling of 6 polynomials during keygen, the code previously used 2 dummy buffers for the two unused output slots. This commit changes this behaviour and instead passes NULL as the output buffer, and drops the respective write. This is both cleaner and faster. Object sharing in the sibling functions poly_getnoise_eta2_4x and poly_getnoise_eta1122_4x will be addressed once they are needed. One can attempt to use a single REQUIRES(...) clause with a large disjunction, which proves OK but seems to cause issues at call-time; this needs further investigation. Signed-off-by: Hanno Becker <[email protected]>

This commit adds a tentative specification, but no proof, for the hash_g/sha3_512 function used by the toplevel INDCPA API. It appears that hash_g is called in two versions: - With input and output equal - With input and output disjoint The tentative spec aims to accommodate both cases. Signed-off-by: Hanno Becker <[email protected]>

This commit brings together the previous workstream of low-level CBMC proofs, into the proof of type safety and memory safety for the MLKEM-INDCPA-KeyGen function `indcpa_keypair_derand()`. In terms of type safety, what the proof in particular shows is that the input/output bounds of the various arithmetic functions are coherent, and that there is hence no risk of arithmetic overflow. Signed-off-by: Hanno Becker <[email protected]>

This commit brings together the numerous low-level proofs to a proof of type-safety and memory-safety of the top-level MLKEM key generation API `crypto_kem_keypair_derand()`. Signed-off-by: Hanno Becker <[email protected]>

mkannwischer

LGTM now - thanks @hanno-becker for making the changes.

This commit disables CBMC coverage generation in CI, which is very time-consuming and not what we are interested in currently. Signed-off-by: Hanno Becker <[email protected]>

hanno-becker force-pushed the indcpa branch 2 times, most recently from 9e6a575 to cb061bc Compare November 9, 2024 06:23

hanno-becker changed the title ~~CBMC: Prooff of type safety and memory safety for toplevel MLKEM KeyGen~~ CBMC: Prooff of type safety and memory safety for toplevel MLKEM KeyGen (modulo NTT+SHA3) Nov 9, 2024

hanno-becker changed the title ~~CBMC: Prooff of type safety and memory safety for toplevel MLKEM KeyGen (modulo NTT+SHA3)~~ CBMC: Proof of type safety and memory safety for toplevel MLKEM KeyGen (modulo NTT+SHA3) Nov 9, 2024

hanno-becker force-pushed the indcpa branch 4 times, most recently from 3f16ec4 to 213e730 Compare November 9, 2024 19:29

hanno-becker requested a review from mkannwischer November 10, 2024 06:13

mkannwischer requested changes Nov 11, 2024

View reviewed changes

mlkem/polyvec.h Show resolved Hide resolved

mlkem/indcpa.c Outdated Show resolved Hide resolved

mlkem/poly.h Show resolved Hide resolved

mlkem/polyvec.h Show resolved Hide resolved

mlkem/indcpa.c Outdated Show resolved Hide resolved

hanno-becker force-pushed the indcpa branch 2 times, most recently from 33b67ac to eb5e55b Compare November 11, 2024 09:48

hanno-becker requested a review from mkannwischer November 11, 2024 09:56

hanno-becker force-pushed the indcpa branch from eb5e55b to 02dd85c Compare November 11, 2024 09:56

hanno-becker added 8 commits November 11, 2024 10:10

CBMC: Prove type+memory-safety of crypto_kem_keypair_derand

055f112

This commit brings together the numerous low-level proofs to a proof of type-safety and memory-safety of the top-level MLKEM key generation API `crypto_kem_keypair_derand()`. Signed-off-by: Hanno Becker <[email protected]>

hanno-becker force-pushed the indcpa branch from 02dd85c to d0a0be0 Compare November 11, 2024 10:10

mkannwischer approved these changes Nov 11, 2024

View reviewed changes

mkannwischer mentioned this pull request Nov 11, 2024

Minor CBMC clean-up #373

Merged

CBMC: Disable generation of coverage information

8fc762a

This commit disables CBMC coverage generation in CI, which is very time-consuming and not what we are interested in currently. Signed-off-by: Hanno Becker <[email protected]>

hanno-becker force-pushed the indcpa branch from d0a0be0 to 8fc762a Compare November 11, 2024 11:54

hanno-becker merged commit 7b4928a into main Nov 11, 2024
33 checks passed

hanno-becker deleted the indcpa branch November 11, 2024 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CBMC: Proof of type safety and memory safety for toplevel MLKEM KeyGen (modulo NTT+SHA3) #370

CBMC: Proof of type safety and memory safety for toplevel MLKEM KeyGen (modulo NTT+SHA3) #370

hanno-becker commented Nov 9, 2024 •

edited

Loading

mkannwischer left a comment

mkannwischer left a comment

CBMC: Proof of type safety and memory safety for toplevel MLKEM KeyGen (modulo NTT+SHA3) #370

CBMC: Proof of type safety and memory safety for toplevel MLKEM KeyGen (modulo NTT+SHA3) #370

Conversation

hanno-becker commented Nov 9, 2024 • edited Loading

mkannwischer left a comment

Choose a reason for hiding this comment

mkannwischer left a comment

Choose a reason for hiding this comment

hanno-becker commented Nov 9, 2024 •

edited

Loading