Skip to content

Commit

Permalink
Initial round of review comments.
Browse files Browse the repository at this point in the history
1. Change KYBER to MLKEM throughout.
2. Fix typos.
3. Add emphasis and clarification of non-obvious rules.

Signed-off-by: Rod Chapman <[email protected]>
  • Loading branch information
rod-chapman committed Nov 5, 2024
1 parent a4c4297 commit e420940
Showing 1 changed file with 28 additions and 28 deletions.
56 changes: 28 additions & 28 deletions cbmc/proofs/proof_guide.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[//]: # (SPDX-License-Identifier: CC-BY-4.0)
d[//]: # (SPDX-License-Identifier: CC-BY-4.0)

# CBMC Proof Guide and Cookbook for MLKEM-C

Expand Down Expand Up @@ -28,10 +28,10 @@ for (int i = 0; i < C; i++) {
```
Notes:
1. It is good practice to declare the counter variable locally, within the scope of the loop.
2. The counter variable should be a constant within the loop body. In the example above, DO NOT modify `i` in the body of the loop.
2. The counter variable should be a constant within the loop body. In the example above, do _not_ modify `i` in the body of the loop.
3. "int" is the best type for the counter variable. "unsigned" integer types complicate matters with their modulo N semantics. Avoid "size_t" since its large range (possibly unsigned 64 bit) can slow proofs down.

CBMC requires basic assigns, loop-invariant, and decreases contracts _in exactly that order_. Note that the contracts appear _before_ the opening `{` of the loop body, so we also need to tell `clang-format` NOT to reformat these lines. The basic pattern is thus:
CBMC requires basic assigns, loop-invariant, and decreases contracts _in exactly that order_. Note that the contracts appear _before_ the opening `{` of the loop body, so we also need to tell `clang-format` not to reformat these lines. The basic pattern is thus:

```
for (int i = 0; i < C; i++)
Expand All @@ -46,7 +46,7 @@ DECREASES(C - i)
}
```

The `i <= C` in the invariant is NOT a typo. CBMC places the invariant just _after_ the loop counter has been incremented, but just _before_ the loop exit test, so it is possible for `i == C` at the invariant on the final iteration of the loop.
The `i <= C` in the invariant is _not_ a typo. CBMC places the invariant just _after_ the loop counter has been incremented, but just _before_ the loop exit test, so it is possible for `i == C` at the invariant on the final iteration of the loop.

### Iterating over an array for a for loop

Expand Down Expand Up @@ -195,7 +195,7 @@ These steps are expanded on in the following sub-sections

For MLKEM-C-AArch64, proof directories lie below `cbmc/proofs`

Create a new sub-directory in there, where the name of the directory is the name of the function _without_ the `$(KYBER_NAMESPACE)` prefix.
Create a new sub-directory in there, where the name of the directory is the name of the function _without_ the `$(MLKEM_NAMESPACE)` prefix.

That directory needs to contain 3 files.

Expand All @@ -209,36 +209,36 @@ I suggest that you copy these files from an existing proof directory and modify

### Update Makefile

The `Makefile` sets options and targets for this proof. Let's imagine that the function we want to prove is called `XXX` (without the `$(KYBER_NAMESPACE)` prefix).
The `Makefile` sets options and targets for this proof. Let's imagine that the function we want to prove is called `XXX` (without the `$(MLKEM_NAMESPACE)` prefix).

Edit the Makefile and update the definition of the following variables:

* HARNESS_FILE - should be `XXX_harness`
* PROOF_UID - should be `XXX`
* PROJECT_SOURCES - should the files containing the source code of XXX
* CHECK_FUNCTION_CONTRACTS - set to the `XXX`, but _with_ the `$(KYBER_NAMESPACE)` prefix if required
* USE_FUNCTION_CONTRACTS - a list of functions that `XXX` calls where you want CBMC to use the contracts of the called function for proof, rather than 'inlining' the called function for proof. Include the `$(KYBER_NAMESPACE)` prefix if required
* CHECK_FUNCTION_CONTRACTS - set to the `XXX`, but _with_ the `$(MLKEM_NAMESPACE)` prefix if required
* USE_FUNCTION_CONTRACTS - a list of functions that `XXX` calls where you want CBMC to use the contracts of the called function for proof, rather than 'inlining' the called function for proof. Include the `$(MLKEM_NAMESPACE)` prefix if required
* CBMCFLAGS - additional flags to pass to the final run of CBMC. This is normally set to `--smt2` which tells cbmc to run Z3 as its underlying solver. Can also be set to `--bitwuzla` which is sometimes better at generaing counter-examples when Z3 fails.
* FUNCTION_NAME - set to `XXX` with the `$(KYBER_NAMESPACE)` prefix if required
* FUNCTION_NAME - set to `XXX` with the `$(MLKEM_NAMESPACE)` prefix if required
* CBMC_OBJECT_BITS. Normally set to 8, but might need to be increased if CBMC runs out of memory for this proof.

For documentation of these (and the other) options, see the `cbmc/proofs/Makefile.common` file.

### Update harness function

The file `XXX_harness.c` should declare a single function called `XXX_harness()` that calls `XXX` exactly one, with appropriately typed parameters. Using contracts, this harness function should NOT need to contain any CBMC `ASSUME` or `ASSERT` statements at all.
The file `XXX_harness.c` should declare a single function called `XXX_harness()` that calls `XXX` exactly one, with appropriately typed parameters. Using contracts, this harness function should not need to contain any CBMC `ASSUME` or `ASSERT` statements at all.

# Supply top-level contracts

Now for the tricky bit. Does `XXX` need any top-level pre- and post-condition contracts, other that those inferred by its type-signature?

A common case is where a function has an array parameter, where the numeric range of the array's elements are NOT just the range of the underlying predefined type.
A common case is where a function has an array parameter, where the numeric range of the array's elements are not just the range of the underlying predefined type.

Secondly, a parameter of a predefined integer type might have to be constrained to a small range.
Similarly, a parameter of a predefined integer type might have to be constrained to a small range.

Check for any BOUND macros in the body that give range constraints on the parmeters, and make sure that your contracts are identical or consistent with those. Also read the comments for XXX to make sure they agree with your contracts.
Check for any BOUND macros in the body that give range constraints on the parameters, and make sure that your contracts are identical or consistent with those. Also read the comments for XXX to make sure they agree with your contracts.

The order of the top-level contracts for a function is always:
The order of the top-level contracts for a function is _always_:
```
t1 XXX(params...)
// clang-format off
Expand All @@ -247,7 +247,7 @@ ASSIGNS()
ENSURES();
// clang-format on
```
with a final semi-colon on the end of the last one.
with a final semi-colon on the end of the last one. That final semi-colon is needed to terminate the function declaration as per normal C syntax.

### Interior contracts and loop invariants

Expand All @@ -264,7 +264,7 @@ This produces `logs/result.txt` in plaintext format.
Before pushing a new proof for a new function, make sure that _all_ proofs run OK from the `cbmc/proofs` directory with

```
export KYBER_K=3 # or 2 or 4...
export MLKEM_K=3 # or 2 or 4...
./run-cbmc-proofs.py --summarize -j8
```

Expand All @@ -285,9 +285,9 @@ The significant changes are:
HARNESS_FILE = poly_tobytes_harness
PROOF_UID = poly_tobytes
PROJECT_SOURCES += $(SRCDIR)/mlkem/poly.c
CHECK_FUNCTION_CONTRACTS=$(KYBER_NAMESPACE)poly_tobytes
CHECK_FUNCTION_CONTRACTS=$(MLKEM_NAMESPACE)poly_tobytes
USE_FUNCTION_CONTRACTS=
FUNCTION_NAME = $(KYBER_NAMESPACE)poly_tobytes
FUNCTION_NAME = $(MLKEM_NAMESPACE)poly_tobytes
```
Note that `USE_FUNCTION_CONTRACTS` is left empty since `poly_tobytes()` is a leaf function that does not call any other functions at all.

Expand All @@ -298,7 +298,7 @@ Note that `USE_FUNCTION_CONTRACTS` is left empty since `poly_tobytes()` is a lea
```
void harness(void) {
poly a;
uint8_t r[KYBER_POLYBYTES];
uint8_t r[MLKEM_POLYBYTES];
/* Contracts for this function are in poly.h */
poly_tobytes(r, &a);
Expand All @@ -315,17 +315,17 @@ The comments on `poly_tobytes()` give us a clear hint:
* with each coefficient in the range [0,1,..,Q-1]
* OUTPUT
* - r: pointer to output byte array
* (of KYBER_POLYBYTES bytes)
* (of MLKEM_POLYBYTES bytes)
```
So we need to write a REQUIRES contract to constrain the ranges of the coefficients denoted by the parameter `a`. There is no constraint on the output byte array, other than it must be the right length, which is given by the function prototype.

We can use the macros in `cbmc.h` to help, thus:

```
void poly_tobytes(uint8_t r[KYBER_POLYBYTES], const poly *a)
void poly_tobytes(uint8_t r[MLKEM_POLYBYTES], const poly *a)
// clang-format off
REQUIRES(a != NULL && IS_FRESH(a, sizeof(poly)))
REQUIRES(ARRAY_IN_BOUNDS(int, k, 0, (KYBER_N - 1), a->coeffs, 0, (KYBER_Q - 1)))
REQUIRES(ARRAY_IN_BOUNDS(int, k, 0, (MLKEM_N - 1), a->coeffs, 0, (MLKEM_Q - 1)))
ASSIGNS(OBJECT_WHOLE(r));
// clang-format on
```
Expand All @@ -335,24 +335,24 @@ ASSIGNS(OBJECT_WHOLE(r));
`poly_tobytes` has a reasonable simple, single loop statement:

```
for (unsigned int i = 0; i < KYBER_N / 2; i++)
for (unsigned int i = 0; i < MLKEM_N / 2; i++)
{ ... }
```

A candidate loop contract needs to state that:
1. The loop body assigns to variable `i` and the whole object pointed to by `r`.
2. Loop counter variable `i` is in range 0 .. KYBER_N / 2 at the point of the loop invariant (remember the pattern above).
3. The loop terminates because the expression `KYBER_N / 2 - i` decreases on every iteration.
2. Loop counter variable `i` is in range 0 .. MLKEM_N / 2 at the point of the loop invariant (remember the pattern above).
3. The loop terminates because the expression `MLKEM_N / 2 - i` decreases on every iteration.

Therefore, we add:

```
for (unsigned int i = 0; i < KYBER_N / 2; i++)
for (unsigned int i = 0; i < MLKEM_N / 2; i++)
// clang-format off
ASSIGNS(i, OBJECT_WHOLE(r))
INVARIANT(i >= 0)
INVARIANT(i <= KYBER_N / 2)
DECREASES(KYBER_N / 2 - i)
INVARIANT(i <= MLKEM_N / 2)
DECREASES(MLKEM_N / 2 - i)
// clang-format on
{ ... }
```
Expand Down

0 comments on commit e420940

Please sign in to comment.