Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBMC: Proof of type safety and memory safety for toplevel MLKEM KeyGen (modulo NTT+SHA3) #370

Merged
merged 9 commits into from
Nov 11, 2024

Conversation

hanno-becker
Copy link
Contributor

@hanno-becker hanno-becker commented Nov 9, 2024

This PR brings together the work of the past weeks into a top-level type+memory safety proof of crypto_kem_keypair_derand() -- with only NTT and SHA3 axiomatized, but all other lower level functions specified and proved.

A few changes had to be made to the existing lower level proofs -- see the individual commit messages for details.

@hanno-becker hanno-becker force-pushed the indcpa branch 2 times, most recently from 9e6a575 to cb061bc Compare November 9, 2024 06:23
@hanno-becker hanno-becker changed the title CBMC: Prooff of type safety and memory safety for toplevel MLKEM KeyGen CBMC: Prooff of type safety and memory safety for toplevel MLKEM KeyGen (modulo NTT+SHA3) Nov 9, 2024
@hanno-becker hanno-becker changed the title CBMC: Prooff of type safety and memory safety for toplevel MLKEM KeyGen (modulo NTT+SHA3) CBMC: Proof of type safety and memory safety for toplevel MLKEM KeyGen (modulo NTT+SHA3) Nov 9, 2024
@hanno-becker hanno-becker force-pushed the indcpa branch 4 times, most recently from 3f16ec4 to 213e730 Compare November 9, 2024 19:29
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice @hanno-becker. Congratulations to the first proof of a high-level function! Looks like things are coming together.

Some suggestions and questions.

mlkem/polyvec.h Show resolved Hide resolved
mlkem/indcpa.c Outdated Show resolved Hide resolved
mlkem/poly.h Show resolved Hide resolved
mlkem/polyvec.h Show resolved Hide resolved
mlkem/indcpa.c Outdated Show resolved Hide resolved
Attempts to verify the toplevel MLKEM-INDCPA API have showed
that the matrix-vector multiplication embedded in the keypair
generation (and, likely, also the one in the encapsulation)
are causing performance issues for CBMC verification.

This commit hoists the matrix-vector multiplication loop,
excluding the preceeding mulcache computation, into a helper
function `matvec_mul`. This helper function is specified
and proved independently.

To make this work, the call to `poly_tomont()` in the matvec
multiplication loop inside key generation needs to be hoisted
out of the loop and replaced by a call to the new function
`polyvec_tomont()`, which is also given the obvious spec+proof.

Signed-off-by: Hanno Becker <[email protected]>
Previously, the spec for polyvec_basemul_acc_montgomery_cached()
would mark the entire destination object around the target
polynomial as tainted. This is too strong if the caller passes
a polynomial which is part of a polynomial vector -- which is
the case during matrix vector multiplication.

This commit strengthens the assignment clause for
polyvec_basemul_acc_montgomery_cached to mark only a poly-sized
slice as tainted.

Similarly, the specification of poly_tomont() is strengthened.

Signed-off-by: Hanno Becker <[email protected]>
Our type+memory safety proof in CBMC does not require any specific
knowledge of what the NTT does. We do, however, need a bound on the
coefficients of the output polynomial, in order to prove type safety
(non-overflow) when the caller subsequently adds to the output.

The definition of the NTT and invNTT bounds are, likely temporarily,
moved to poly.h, to avoid a cyclic dependency between headers.

Signed-off-by: Hanno Becker <[email protected]>
Previously, the assign clauses in poly_cbd_eta{1,2} would see the
entire object containing the target polynomial as tainted. This is
too strong if the target polynoimal is part of a polynomial vector.

This commit strengthens the specifications to only taint the
poly-sized slice spanned by the target polynomial.

Signed-off-by: Hanno Becker <[email protected]>
Previously, poly_getnoise_eta1_4x() was specified as
requiring four output polynomials satisfying IS_FRESH(...).
While intuitively we only want to express that the output polynomials
are disjoint, the obligation generated for a caller is, in fact, much
stronger: The target polynomials cannot belong to the same object.

In particular, the current specification would not allow to pass
in four consecutive slices of a polyvec, or say 2 consecutive polyvecs
in the first and last two arguments each.

Those limitations will prevent verification of indcpa_keypair_derand,
which does indeed pass slices of polyvecs to the getnoise functions.

To accommodate, this commit introduces multiple preconditions
varying on in the amount of object-sharing that's permitted:
For MLKEM_K=4, the output polynomials must be consecutive slices
of the same object. For MLKEM_K=2, the first and last two must be
consecutive, but belong to separate objects. For MLKEM_K=3, the
first three must be consecutive, and the forth is unused and must
be NULL.

For MLKEM_K=3, which needs sampling of 6 polynomials during keygen,
the code previously used 2 dummy buffers for the two unused output
slots. This commit changes this behaviour and instead passes NULL
as the output buffer, and drops the respective write. This is both
cleaner and faster.

Object sharing in the sibling functions poly_getnoise_eta2_4x
and poly_getnoise_eta1122_4x will be addressed once they are
needed.

One can attempt to use a single REQUIRES(...) clause with a large
disjunction, which proves OK but seems to cause issues at call-time;
this needs further investigation.

Signed-off-by: Hanno Becker <[email protected]>
This commit adds a tentative specification, but no proof, for the
hash_g/sha3_512 function used by the toplevel INDCPA API.

It appears that hash_g is called in two versions:
- With input and output equal
- With input and output disjoint

The tentative spec aims to accommodate both cases.

Signed-off-by: Hanno Becker <[email protected]>
This commit brings together the previous workstream of low-level
CBMC proofs, into the proof of type safety and memory safety for
the MLKEM-INDCPA-KeyGen function `indcpa_keypair_derand()`.

In terms of type safety, what the proof in particular shows is that
the input/output bounds of the various arithmetic functions are
coherent, and that there is hence no risk of arithmetic overflow.

Signed-off-by: Hanno Becker <[email protected]>
This commit brings together the numerous low-level proofs to
a proof of type-safety and memory-safety of the top-level
MLKEM key generation API `crypto_kem_keypair_derand()`.

Signed-off-by: Hanno Becker <[email protected]>
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now - thanks @hanno-becker for making the changes.

@mkannwischer mkannwischer mentioned this pull request Nov 11, 2024
This commit disables CBMC coverage generation in CI, which is
very time-consuming and not what we are interested in currently.

Signed-off-by: Hanno Becker <[email protected]>
@hanno-becker hanno-becker merged commit 7b4928a into main Nov 11, 2024
33 checks passed
@hanno-becker hanno-becker deleted the indcpa branch November 11, 2024 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants