From 9f272ab1f9476802655b4daaad518481361349fb Mon Sep 17 00:00:00 2001 From: "Matthias J. Kannwischer" Date: Thu, 18 Apr 2024 17:06:02 +0800 Subject: [PATCH 1/9] describe goals in README Signed-off-by: Matthias J. Kannwischer --- README.md | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index ec54e2eaf..f4af653d5 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,42 @@ [//]: # (SPDX-License-Identifier: CC-BY-4.0) -[//]: # (TODO Customize project readme) -# template-code +**MLKEM-C-AArch64** is a collection of [MLKEM](https://doi.org/10.6028/NIST.FIPS.203.ipd) implementations optimized for number of different Armv8-A and Armv9-A microarchitectures. -Template for creating code repositories, with basic file setup included +Initially the primary target platforms are: + - Arm Cortex-A72 (as used in the Raspberry Pi4) + - Apple M1 + - AWS [Graviton 4](https://press.aboutamazon.com/2023/11/aws-unveils-next-generation-aws-designed-chips) instances based on [Arm Neoverse V2](https://developer.arm.com/Processors/Neoverse%20V2) + + +## Goals of MLKEM-C-AArch64 + +The goals of this project are as follows: + +- Provide production-grade code that can be dropped into other projects. +- Being permissibly licensed with all code coming with an Apache-2.0 license. +- Tested against the official reference known-answer tests (KATs) and extended KATs (taken from another [PQCP](https://github.com/pq-code-package) project). +- Include Neon assembly implementations of the core building blocks of MLKEM performing well on a wide range of Armv8-A and Armv9-A platforms. +- Achieve performance matching the state-of-the-art on the target platforms. +- Maintainability should not be sacrificed and assembly should be as readable as possible. We make use of automated tooling for microarchitecture-specific optimization (e.g., by using [SLOTHY](https://slothy-optimizer.github.io/slothy/)). +- Provide a unified interface for Keccak implementations allowing 2-way, 4-way, and 8-way parallel implementations depending on the target microarchitecture. +- Eventually, we aim to unify the implementations with the implementations in [mlkem-c-generic](https://github.com/pq-code-package/mlkem-c-generic). However, we believe that for AArch64, there are too relevant microarchitectures to come up with a single implementation that performs well on all. + + +## Current state + +**MLKEM-C-AArch64** is currently a work in progress and we do not recommend relying on it at this point. +**WE DO NOT CURRENTLY RECOMMEND RELYING ON THIS LIBRARY IN A PRODUCTION ENVIRONMENT OR TO PROTECT ANY SENSITIVE DATA.** +Once we have the first stable version, this notice will be removed + +The current code is compatible with the [`standard` branch of the official MLKEM repository](https://github.com/pq-crystals/kyber/tree/standard). + +## Call for contributors + +We are actively seeking contributors who can help us build **MLKEM-C-AArch64**. +If you are interested, please contact us, or volunteer for any of the open issues. + +## Call for potential consumers + +If you are a potential consumer of **MLKEM-C-AArch64**, please reach out to us. +We're interested in hearing the way you are considering using **MLKEM-C-AArch64** and could benefit from additional features. +If you have specific feature requests, please open an issue. From d52130cb485c7e85cad16c4a4526c0ee9124c32e Mon Sep 17 00:00:00 2001 From: cothan Date: Tue, 7 May 2024 02:03:12 -0400 Subject: [PATCH 2/9] Add checksum KAT to Github Workflow (#29) * Add checksum KAT to Github Action Signed-off-by: Duc Tri Nguyen * add make mlkem and make clean to build Signed-off-by: Duc Tri Nguyen * add checksum for test_kyber* Signed-off-by: Duc Tri Nguyen * Add NISTKAT Signed-off-by: Duc Tri Nguyen * Add hashsum for NISTKAT Signed-off-by: Duc Tri Nguyen * Add SPDX header Signed-off-by: Duc Tri Nguyen * Add NISTKAT to Makefile Signed-off-by: Duc Tri Nguyen * Update gitignore Signed-off-by: Duc Tri Nguyen * fix format Signed-off-by: Duc Tri Nguyen * forward output to pipe directly Signed-off-by: Duc Tri Nguyen * extract 1st column Signed-off-by: Duc Tri Nguyen * remove for loop Signed-off-by: Duc Tri Nguyen * Simplify Makefile Signed-off-by: Duc Tri Nguyen * check return code only Signed-off-by: Duc Tri Nguyen * Update .github/workflows/build.yml Co-authored-by: Matthias J. Kannwischer Signed-off-by: cothan * replace nistkat Signed-off-by: Duc Tri Nguyen * fix incorrect space Signed-off-by: Duc Tri Nguyen --------- Signed-off-by: Duc Tri Nguyen Signed-off-by: cothan Co-authored-by: Matthias J. Kannwischer --- .github/workflows/build.yml | 45 ++- .gitignore | 6 + Makefile | 68 +++- checksum.sh | 14 + test/gen_KAT.c | 64 ++++ test/gen_NISTKAT.c | 89 +++++ test/nistrng/aes.c | 627 ++++++++++++++++++++++++++++++++++++ test/nistrng/aes.h | 66 ++++ test/nistrng/randombytes.h | 13 + test/nistrng/rng.c | 87 +++++ 10 files changed, 1054 insertions(+), 25 deletions(-) create mode 100755 checksum.sh create mode 100644 test/gen_KAT.c create mode 100644 test/gen_NISTKAT.c create mode 100644 test/nistrng/aes.c create mode 100644 test/nistrng/aes.h create mode 100644 test/nistrng/randombytes.h create mode 100644 test/nistrng/rng.c diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 7f8ef55b2..cc75d375b 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -3,19 +3,19 @@ name: Build on: push: - branches: [ '*' ] + branches: ["*"] pull_request: - branches: [ "main" ] + branches: ["main"] jobs: build_test: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 - - name: Setup nix - uses: ./.github/actions/setup-nix - - name: Astyle - shell: nix develop .#ci -c bash -e {0} - run: | + - uses: actions/checkout@v4 + - name: Setup nix + uses: ./.github/actions/setup-nix + - name: Astyle + shell: nix develop .#ci -c bash -e {0} + run: | err=$(astyle $(git ls-files "*.c" "*.h") --options=.astylerc --dry-run --formatted | awk '{print $2}') if [[ ${#err} != 0 ]]; then echo "$err" | while IFS= read -r file; do @@ -23,7 +23,28 @@ jobs: done exit 1 fi - - name: Build targets - shell: nix develop .#ci -c bash -e {0} - run: | - make \ No newline at end of file + - name: Build targets + shell: nix develop .#ci -c bash -e {0} + run: | + make mlkem + ./test/test_kyber512 + ./test/test_kyber768 + ./test/test_kyber1024 + - name: Compare gen_KAT with known hash + shell: nix develop .#ci -c bash -e {0} + run: | + make kat; + ./checksum.sh ./test/gen_KAT512 ec4ac397e595ac7457cb7d8830921faf3290898a10d7dd3864aab89ea61fe9a3 + ./checksum.sh ./test/gen_KAT768 9a0826ad3c5232dfd3b21bc4801408655c565a491b760f509b2ee2cd7180babe + ./checksum.sh ./test/gen_KAT1024 6dafb867599b750a6a831b03e494cf41dea748c78a0e275e7b268bbb893cf37d + - name: Compare gen_NISTKAT with known hash + shell: nix develop .#ci -c bash -e {0} + run: | + make nistkat; + ./checksum.sh ./test/gen_NISTKAT512 4b88ac7643ff60209af1175e025f354272e88df827a0ce1c056e403629b88e04 + ./checksum.sh ./test/gen_NISTKAT768 21b4a1e1ea34a13c26a9da5eeb9325afb5ca11596ca6f3704c3f2637e3ea7524 + ./checksum.sh ./test/gen_NISTKAT1024 6471398b0a728ee1ef39e93bb89b526fbf59587a3662edadbcfc6c88a512cd71 + - name: Clean up + shell: nix develop .#ci -c bash -e {0} + run: | + make clean diff --git a/.gitignore b/.gitignore index 2db7dfcca..fa4718f13 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,9 @@ test/test_kyber512 test/test_kyber768 test/test_kyber1024 +test/gen_KAT512 +test/gen_KAT768 +test/gen_KAT1024 +test/gen_NISTKAT512 +test/gen_NISTKAT768 +test/gen_NISTKAT1024 diff --git a/Makefile b/Makefile index d7f3f17bf..9ab5ac38f 100644 --- a/Makefile +++ b/Makefile @@ -1,39 +1,81 @@ CC ?= /usr/bin/cc -CFLAGS_FIPS202 = -I fips202 -CFLAGS_MLKEM = -I mlkem -CFLAGS_RANDOMBYTES = -I randombytes -CFLAGS_TEST = -I test +INCLUDE_FIPS202 = -I fips202 +INCLUDE_MLKEM = -I mlkem +INCLUDE_RANDOM = -I randombytes +INCLUDE_NISTRANDOM = -I test/nistrng CFLAGS += -Wall -Wextra -Wpedantic -Wmissing-prototypes -Wredundant-decls \ -Wshadow -Wpointer-arith -O3 -fomit-frame-pointer -pedantic \ - ${CFLAGS_RANDOMBYTES} ${CFLAGS_MLKEM} ${CFLAGS_FIPS202} ${CFLAGS_TEST} + ${INCLUDE_MLKEM} ${INCLUDE_FIPS202} +CFLAGS_RANDOMBYTES = ${CFLAGS} ${INCLUDE_RANDOM} +CFLAGS_NISTRANDOMBYTES = ${CFLAGS} ${INCLUDE_NISTRANDOM} NISTFLAGS += -Wno-unused-result -O3 -fomit-frame-pointer RM = /bin/rm SOURCES = mlkem/kem.c mlkem/indcpa.c mlkem/polyvec.c mlkem/poly.c mlkem/ntt.c mlkem/cbd.c mlkem/reduce.c mlkem/verify.c SOURCESKECCAK = $(SOURCES) fips202/keccakf1600.c fips202/fips202.c mlkem/symmetric-shake.c +SOURCESKECCAKRANDOM = $(SOURCESKECCAK) randombytes/randombytes.c +SOURCESNISTKATS = $(SOURCESKECCAK) test/nistrng/aes.c test/nistrng/rng.c + HEADERS = mlkem/params.h mlkem/kem.h mlkem/indcpa.h mlkem/polyvec.h mlkem/poly.h mlkem/ntt.h mlkem/cbd.h mlkem/reduce.c mlkem/verify.h mlkem/symmetric.h HEADERSKECCAK = $(HEADERS) fips202/keccakf1600.h fips202/fips202.h +HEADERSKECCAKRANDOM = $(HEADERSKECCAK) randombytes/randombytes.h +HEADERNISTKATS = $(HEADERSKECCAK) test/nistrng/aes.h test/nistrng/randombytes.h -.PHONY: all mlkem clean +.PHONY: all mlkem kat nistkat clean -all: mlkem +all: mlkem kat nistkat mlkem: \ test/test_kyber512 \ test/test_kyber768 \ test/test_kyber1024 -test/test_kyber512: $(SOURCESKECCAK) $(HEADERSKECCAK) test/test_kyber.c randombytes/randombytes.c - $(CC) $(CFLAGS) -DKYBER_K=2 $(SOURCESKECCAK) randombytes/randombytes.c test/test_kyber.c -o $@ +nistkat: \ + test/gen_NISTKAT512 \ + test/gen_NISTKAT768 \ + test/gen_NISTKAT1024 + +kat: \ + test/gen_KAT512 \ + test/gen_KAT768 \ + test/gen_KAT1024 + +test/test_kyber512: $(SOURCESKECCAKRANDOM) $(HEADERSKECCAKRANDOM) test/test_kyber.c + $(CC) $(CFLAGS_RANDOMBYTES) -DKYBER_K=2 $(SOURCESKECCAKRANDOM) test/test_kyber.c -o $@ + +test/test_kyber768: $(SOURCESKECCAKRANDOM) $(HEADERSKECCAKRANDOM) test/test_kyber.c + $(CC) $(CFLAGS_RANDOMBYTES) -DKYBER_K=3 $(SOURCESKECCAKRANDOM) test/test_kyber.c -o $@ + +test/test_kyber1024: $(SOURCESKECCAKRANDOM) $(HEADERSKECCAKRANDOM) test/test_kyber.c + $(CC) $(CFLAGS_RANDOMBYTES) -DKYBER_K=4 $(SOURCESKECCAKRANDOM) test/test_kyber.c -o $@ + +test/gen_KAT512: $(SOURCESKECCAKRANDOM) $(HEADERSKECCAKRANDOM) test/gen_KAT.c + $(CC) $(CFLAGS_RANDOMBYTES) -DKYBER_K=2 $(SOURCESKECCAKRANDOM) test/gen_KAT.c -o $@ + +test/gen_KAT768: $(SOURCESKECCAKRANDOM) $(HEADERSKECCAKRANDOM) test/gen_KAT.c + $(CC) $(CFLAGS_RANDOMBYTES) -DKYBER_K=3 $(SOURCESKECCAKRANDOM) test/gen_KAT.c -o $@ + +test/gen_KAT1024: $(SOURCESKECCAKRANDOM) $(HEADERSKECCAKRANDOM) test/gen_KAT.c + $(CC) $(CFLAGS_RANDOMBYTES) -DKYBER_K=4 $(SOURCESKECCAKRANDOM) test/gen_KAT.c -o $@ + +test/gen_NISTKAT512: $(SOURCESNISTKATS) $(HEADERNISTKATS) test/gen_NISTKAT.c + $(CC) $(CFLAGS_NISTRANDOMBYTES) -DKYBER_K=2 $(SOURCESNISTKATS) test/gen_NISTKAT.c -o $@ + +test/gen_NISTKAT768: $(SOURCESNISTKATS) $(HEADERNISTKATS) test/gen_NISTKAT.c + $(CC) $(CFLAGS_NISTRANDOMBYTES) -DKYBER_K=3 $(SOURCESNISTKATS) test/gen_NISTKAT.c -o $@ -test/test_kyber768: $(SOURCESKECCAK) $(HEADERSKECCAK) test/test_kyber.c randombytes/randombytes.c - $(CC) $(CFLAGS) -DKYBER_K=3 $(SOURCESKECCAK) randombytes/randombytes.c test/test_kyber.c -o $@ +test/gen_NISTKAT1024: $(SOURCESNISTKATS) $(HEADERNISTKATS) test/gen_NISTKAT.c + $(CC) $(CFLAGS_NISTRANDOMBYTES) -DKYBER_K=4 $(SOURCESNISTKATS) test/gen_NISTKAT.c -o $@ -test/test_kyber1024: $(SOURCESKECCAK) $(HEADERSKECCAK) test/test_kyber.c randombytes/randombytes.c - $(CC) $(CFLAGS) -DKYBER_K=4 $(SOURCESKECCAK) randombytes/randombytes.c test/test_kyber.c -o $@ clean: -$(RM) -rf *.gcno *.gcda *.lcov *.o *.so -$(RM) -rf test/test_kyber512 -$(RM) -rf test/test_kyber768 -$(RM) -rf test/test_kyber1024 + -$(RM) -rf test/gen_KAT512 + -$(RM) -rf test/gen_KAT768 + -$(RM) -rf test/gen_KAT1024 + -$(RM) -rf test/gen_NISTKAT512 + -$(RM) -rf test/gen_NISTKAT768 + -$(RM) -rf test/gen_NISTKAT1024 diff --git a/checksum.sh b/checksum.sh new file mode 100755 index 000000000..9a5747e33 --- /dev/null +++ b/checksum.sh @@ -0,0 +1,14 @@ +#!/bin/bash +# SPDX-License-Identifier: Apache-2.0 + +# This script executes a binary file, captures its output, then generates and compares its SHA-256 hash with a provided one. + +output_hash=$(./$1 | sha256sum | awk '{ print $1 }') + +if [[ ${output_hash} == "${2}" ]]; then + echo "${1} Hashes match." + exit 0 +else + echo "${1} Hashes do not match: ${output_hash} vs ${2}" + exit 1 +fi diff --git a/test/gen_KAT.c b/test/gen_KAT.c new file mode 100644 index 000000000..f5a994f02 --- /dev/null +++ b/test/gen_KAT.c @@ -0,0 +1,64 @@ +// SPDX-License-Identifier: Apache-2.0 +#include +#include +#include +#include "fips202.h" +#include "kem.h" +#include "params.h" + +#define NTESTS 10000 + +static void print_hex(const char *label, const uint8_t *data, size_t size) { + printf("%s = ", label); + for (size_t i = 0; i < size; i++) { + printf("%02x", data[i]); + } + printf("\n"); +} + +static void shake256_absorb(shake256incctx *state, const uint8_t *input, size_t inlen) { + shake256_inc_init(state); + shake256_inc_absorb(state, input, inlen); + shake256_inc_finalize(state); +} + +int main(void) { + uint8_t coins[3 * KYBER_SYMBYTES]; + uint8_t pk[CRYPTO_PUBLICKEYBYTES]; + uint8_t sk[CRYPTO_SECRETKEYBYTES]; + uint8_t ct[CRYPTO_CIPHERTEXTBYTES]; + uint8_t ss1[CRYPTO_BYTES]; + uint8_t ss2[CRYPTO_BYTES]; + + const uint8_t seed[64] = {32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, + 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, + 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, + 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, + }; + + shake256incctx state; + shake256_absorb(&state, seed, sizeof(seed)); + + for (unsigned int i = 0; i < NTESTS; i++) { + + shake256_inc_squeeze(coins, sizeof(coins), &state); + + crypto_kem_keypair_derand(pk, sk, coins); + print_hex("pk", pk, sizeof(pk)); + print_hex("sk", sk, sizeof(sk)); + + crypto_kem_enc_derand(ct, ss1, pk, coins + 2 * KYBER_SYMBYTES); + print_hex("ct", ct, sizeof(ct)); + + crypto_kem_dec(ss2, ct, sk); + + if (memcmp(ss1, ss2, sizeof(ss1))) { + fprintf(stderr, "ERROR\n"); + return -1; + } + + print_hex("ss", ss1, sizeof(ss1)); + } + + return 0; +} diff --git a/test/gen_NISTKAT.c b/test/gen_NISTKAT.c new file mode 100644 index 000000000..b5938b704 --- /dev/null +++ b/test/gen_NISTKAT.c @@ -0,0 +1,89 @@ +// SPDX-License-Identifier: Apache-2.0 + +#include +#include + +#include "kem.h" +#include "randombytes.h" + +static void fprintBstr(FILE *fp, const char *S, const uint8_t *A, size_t L) { + size_t i; + fprintf(fp, "%s", S); + for (i = 0; i < L; i++) { + fprintf(fp, "%02X", A[i]); + } + if (L == 0) { + fprintf(fp, "00"); + } + fprintf(fp, "\n"); +} + +static void randombytes_nth(uint8_t *seed, size_t nth, size_t len) { + uint8_t entropy_input[48] = {0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15, + 16, 17, 18, 19, 20, 21, 22, 23, + 24, 25, 26, 27, 28, 29, 30, 31, + 32, 33, 34, 35, 36, 37, 38, 39, + 40, 41, 42, 43, 44, 45, 46, 47 + }; + nist_kat_init(entropy_input, NULL, 256); + + for (size_t i = 0; i < nth + 1; i++) { + randombytes(seed, len); + } +} + +int main(void) { + uint8_t seed[48]; + FILE *fh = stdout; + uint8_t public_key[CRYPTO_PUBLICKEYBYTES]; + uint8_t secret_key[CRYPTO_SECRETKEYBYTES]; + uint8_t ciphertext[CRYPTO_CIPHERTEXTBYTES]; + uint8_t shared_secret_e[CRYPTO_BYTES]; + uint8_t shared_secret_d[CRYPTO_BYTES]; + int rc; + + int count = 0; + + fprintf(fh, "# %s\n\n", CRYPTO_ALGNAME); + + do { + fprintf(fh, "count = %d\n", count); + randombytes_nth(seed, count, 48); + fprintBstr(fh, "seed = ", seed, 48); + + nist_kat_init(seed, NULL, 256); + + rc = crypto_kem_keypair(public_key, secret_key); + if (rc != 0) { + fprintf(stderr, "[kat_kem] %s ERROR: crypto_kem_keypair failed!\n", CRYPTO_ALGNAME); + return -1; + } + fprintBstr(fh, "pk = ", public_key, CRYPTO_PUBLICKEYBYTES); + fprintBstr(fh, "sk = ", secret_key, CRYPTO_SECRETKEYBYTES); + + rc = crypto_kem_enc(ciphertext, shared_secret_e, public_key); + if (rc != 0) { + fprintf(stderr, "[kat_kem] %s ERROR: crypto_kem_enc failed!\n", CRYPTO_ALGNAME); + return -2; + } + fprintBstr(fh, "ct = ", ciphertext, CRYPTO_CIPHERTEXTBYTES); + fprintBstr(fh, "ss = ", shared_secret_e, CRYPTO_BYTES); + fprintf(fh, "\n"); + + rc = crypto_kem_dec(shared_secret_d, ciphertext, secret_key); + if (rc != 0) { + fprintf(stderr, "[kat_kem] %s ERROR: crypto_kem_dec failed!\n", CRYPTO_ALGNAME); + return -3; + } + + rc = memcmp(shared_secret_e, shared_secret_d, CRYPTO_BYTES); + if (rc != 0) { + fprintf(stderr, "[kat_kem] %s ERROR: shared secrets are not equal\n", CRYPTO_ALGNAME); + return -4; + } + count++; + } while (count < 100); + + return 0; +} diff --git a/test/nistrng/aes.c b/test/nistrng/aes.c new file mode 100644 index 000000000..eb57fe3b2 --- /dev/null +++ b/test/nistrng/aes.c @@ -0,0 +1,627 @@ +// SPDX-License-Identifier: MIT + +/* + * AES implementation based on code from BearSSL (https://bearssl.org/) + * by Thomas Pornin. + * + * + * Copyright (c) 2016 Thomas Pornin + * + * Permission is hereby granted, free of charge, to any person obtaining + * a copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sublicense, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice shall be + * included in all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include + +#include "aes.h" + +static inline uint32_t br_dec32le(const unsigned char *src) { + return (uint32_t)src[0] | ((uint32_t)src[1] << 8) | ((uint32_t)src[2] << 16) | ((uint32_t)src[3] << 24); +} + +static void br_range_dec32le(uint32_t *v, size_t num, const unsigned char *src) { + while (num-- > 0) { + *v++ = br_dec32le(src); + src += 4; + } +} + +static inline uint32_t br_swap32(uint32_t x) { + x = ((x & (uint32_t)0x00FF00FF) << 8) | ((x >> 8) & (uint32_t)0x00FF00FF); + return (x << 16) | (x >> 16); +} + +static inline void br_enc32le(unsigned char *dst, uint32_t x) { + dst[0] = (unsigned char)x; + dst[1] = (unsigned char)(x >> 8); + dst[2] = (unsigned char)(x >> 16); + dst[3] = (unsigned char)(x >> 24); +} + +static void br_range_enc32le(unsigned char *dst, const uint32_t *v, size_t num) { + while (num-- > 0) { + br_enc32le(dst, *v++); + dst += 4; + } +} + +static void br_aes_ct64_bitslice_Sbox(uint64_t *q) { + /* + * This S-box implementation is a straightforward translation of + * the circuit described by Boyar and Peralta in "A new + * combinational logic minimization technique with applications + * to cryptology" (https://eprint.iacr.org/2009/191.pdf). + * + * Note that variables x* (input) and s* (output) are numbered + * in "reverse" order (x0 is the high bit, x7 is the low bit). + */ + + uint64_t x0, x1, x2, x3, x4, x5, x6, x7; + uint64_t y1, y2, y3, y4, y5, y6, y7, y8, y9; + uint64_t y10, y11, y12, y13, y14, y15, y16, y17, y18, y19; + uint64_t y20, y21; + uint64_t z0, z1, z2, z3, z4, z5, z6, z7, z8, z9; + uint64_t z10, z11, z12, z13, z14, z15, z16, z17; + uint64_t t0, t1, t2, t3, t4, t5, t6, t7, t8, t9; + uint64_t t10, t11, t12, t13, t14, t15, t16, t17, t18, t19; + uint64_t t20, t21, t22, t23, t24, t25, t26, t27, t28, t29; + uint64_t t30, t31, t32, t33, t34, t35, t36, t37, t38, t39; + uint64_t t40, t41, t42, t43, t44, t45, t46, t47, t48, t49; + uint64_t t50, t51, t52, t53, t54, t55, t56, t57, t58, t59; + uint64_t t60, t61, t62, t63, t64, t65, t66, t67; + uint64_t s0, s1, s2, s3, s4, s5, s6, s7; + + x0 = q[7]; + x1 = q[6]; + x2 = q[5]; + x3 = q[4]; + x4 = q[3]; + x5 = q[2]; + x6 = q[1]; + x7 = q[0]; + + /* + * Top linear transformation. + */ + y14 = x3 ^ x5; + y13 = x0 ^ x6; + y9 = x0 ^ x3; + y8 = x0 ^ x5; + t0 = x1 ^ x2; + y1 = t0 ^ x7; + y4 = y1 ^ x3; + y12 = y13 ^ y14; + y2 = y1 ^ x0; + y5 = y1 ^ x6; + y3 = y5 ^ y8; + t1 = x4 ^ y12; + y15 = t1 ^ x5; + y20 = t1 ^ x1; + y6 = y15 ^ x7; + y10 = y15 ^ t0; + y11 = y20 ^ y9; + y7 = x7 ^ y11; + y17 = y10 ^ y11; + y19 = y10 ^ y8; + y16 = t0 ^ y11; + y21 = y13 ^ y16; + y18 = x0 ^ y16; + + /* + * Non-linear section. + */ + t2 = y12 & y15; + t3 = y3 & y6; + t4 = t3 ^ t2; + t5 = y4 & x7; + t6 = t5 ^ t2; + t7 = y13 & y16; + t8 = y5 & y1; + t9 = t8 ^ t7; + t10 = y2 & y7; + t11 = t10 ^ t7; + t12 = y9 & y11; + t13 = y14 & y17; + t14 = t13 ^ t12; + t15 = y8 & y10; + t16 = t15 ^ t12; + t17 = t4 ^ t14; + t18 = t6 ^ t16; + t19 = t9 ^ t14; + t20 = t11 ^ t16; + t21 = t17 ^ y20; + t22 = t18 ^ y19; + t23 = t19 ^ y21; + t24 = t20 ^ y18; + + t25 = t21 ^ t22; + t26 = t21 & t23; + t27 = t24 ^ t26; + t28 = t25 & t27; + t29 = t28 ^ t22; + t30 = t23 ^ t24; + t31 = t22 ^ t26; + t32 = t31 & t30; + t33 = t32 ^ t24; + t34 = t23 ^ t33; + t35 = t27 ^ t33; + t36 = t24 & t35; + t37 = t36 ^ t34; + t38 = t27 ^ t36; + t39 = t29 & t38; + t40 = t25 ^ t39; + + t41 = t40 ^ t37; + t42 = t29 ^ t33; + t43 = t29 ^ t40; + t44 = t33 ^ t37; + t45 = t42 ^ t41; + z0 = t44 & y15; + z1 = t37 & y6; + z2 = t33 & x7; + z3 = t43 & y16; + z4 = t40 & y1; + z5 = t29 & y7; + z6 = t42 & y11; + z7 = t45 & y17; + z8 = t41 & y10; + z9 = t44 & y12; + z10 = t37 & y3; + z11 = t33 & y4; + z12 = t43 & y13; + z13 = t40 & y5; + z14 = t29 & y2; + z15 = t42 & y9; + z16 = t45 & y14; + z17 = t41 & y8; + + /* + * Bottom linear transformation. + */ + t46 = z15 ^ z16; + t47 = z10 ^ z11; + t48 = z5 ^ z13; + t49 = z9 ^ z10; + t50 = z2 ^ z12; + t51 = z2 ^ z5; + t52 = z7 ^ z8; + t53 = z0 ^ z3; + t54 = z6 ^ z7; + t55 = z16 ^ z17; + t56 = z12 ^ t48; + t57 = t50 ^ t53; + t58 = z4 ^ t46; + t59 = z3 ^ t54; + t60 = t46 ^ t57; + t61 = z14 ^ t57; + t62 = t52 ^ t58; + t63 = t49 ^ t58; + t64 = z4 ^ t59; + t65 = t61 ^ t62; + t66 = z1 ^ t63; + s0 = t59 ^ t63; + s6 = t56 ^ ~t62; + s7 = t48 ^ ~t60; + t67 = t64 ^ t65; + s3 = t53 ^ t66; + s4 = t51 ^ t66; + s5 = t47 ^ t65; + s1 = t64 ^ ~s3; + s2 = t55 ^ ~t67; + + q[7] = s0; + q[6] = s1; + q[5] = s2; + q[4] = s3; + q[3] = s4; + q[2] = s5; + q[1] = s6; + q[0] = s7; +} + +static void br_aes_ct64_ortho(uint64_t *q) { +#define SWAPN(cl, ch, s, x, y) \ + do \ + { \ + uint64_t a, b; \ + a = (x); \ + b = (y); \ + (x) = (a & (uint64_t)(cl)) | ((b & (uint64_t)(cl)) << (s)); \ + (y) = ((a & (uint64_t)(ch)) >> (s)) | (b & (uint64_t)(ch)); \ + } while (0) + +#define SWAP2(x, y) SWAPN(0x5555555555555555, 0xAAAAAAAAAAAAAAAA, 1, x, y) +#define SWAP4(x, y) SWAPN(0x3333333333333333, 0xCCCCCCCCCCCCCCCC, 2, x, y) +#define SWAP8(x, y) SWAPN(0x0F0F0F0F0F0F0F0F, 0xF0F0F0F0F0F0F0F0, 4, x, y) + + SWAP2(q[0], q[1]); + SWAP2(q[2], q[3]); + SWAP2(q[4], q[5]); + SWAP2(q[6], q[7]); + + SWAP4(q[0], q[2]); + SWAP4(q[1], q[3]); + SWAP4(q[4], q[6]); + SWAP4(q[5], q[7]); + + SWAP8(q[0], q[4]); + SWAP8(q[1], q[5]); + SWAP8(q[2], q[6]); + SWAP8(q[3], q[7]); +} + +static void br_aes_ct64_interleave_in(uint64_t *q0, uint64_t *q1, const uint32_t *w) { + uint64_t x0, x1, x2, x3; + + x0 = w[0]; + x1 = w[1]; + x2 = w[2]; + x3 = w[3]; + x0 |= (x0 << 16); + x1 |= (x1 << 16); + x2 |= (x2 << 16); + x3 |= (x3 << 16); + x0 &= (uint64_t)0x0000FFFF0000FFFF; + x1 &= (uint64_t)0x0000FFFF0000FFFF; + x2 &= (uint64_t)0x0000FFFF0000FFFF; + x3 &= (uint64_t)0x0000FFFF0000FFFF; + x0 |= (x0 << 8); + x1 |= (x1 << 8); + x2 |= (x2 << 8); + x3 |= (x3 << 8); + x0 &= (uint64_t)0x00FF00FF00FF00FF; + x1 &= (uint64_t)0x00FF00FF00FF00FF; + x2 &= (uint64_t)0x00FF00FF00FF00FF; + x3 &= (uint64_t)0x00FF00FF00FF00FF; + *q0 = x0 | (x2 << 8); + *q1 = x1 | (x3 << 8); +} + +static void br_aes_ct64_interleave_out(uint32_t *w, uint64_t q0, uint64_t q1) { + uint64_t x0, x1, x2, x3; + + x0 = q0 & (uint64_t)0x00FF00FF00FF00FF; + x1 = q1 & (uint64_t)0x00FF00FF00FF00FF; + x2 = (q0 >> 8) & (uint64_t)0x00FF00FF00FF00FF; + x3 = (q1 >> 8) & (uint64_t)0x00FF00FF00FF00FF; + x0 |= (x0 >> 8); + x1 |= (x1 >> 8); + x2 |= (x2 >> 8); + x3 |= (x3 >> 8); + x0 &= (uint64_t)0x0000FFFF0000FFFF; + x1 &= (uint64_t)0x0000FFFF0000FFFF; + x2 &= (uint64_t)0x0000FFFF0000FFFF; + x3 &= (uint64_t)0x0000FFFF0000FFFF; + w[0] = (uint32_t)x0 | (uint32_t)(x0 >> 16); + w[1] = (uint32_t)x1 | (uint32_t)(x1 >> 16); + w[2] = (uint32_t)x2 | (uint32_t)(x2 >> 16); + w[3] = (uint32_t)x3 | (uint32_t)(x3 >> 16); +} + +static const unsigned char Rcon[] = { + 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1B, 0x36 +}; + +static uint32_t sub_word(uint32_t x) { + uint64_t q[8]; + + memset(q, 0, sizeof q); + q[0] = x; + br_aes_ct64_ortho(q); + br_aes_ct64_bitslice_Sbox(q); + br_aes_ct64_ortho(q); + return (uint32_t)q[0]; +} + +static void br_aes_ct64_keysched(uint64_t *comp_skey, const unsigned char *key, unsigned int key_len) { + unsigned int i, j, k, nk, nkf; + uint32_t tmp; + uint32_t skey[60]; + unsigned nrounds = 10 + ((key_len - 16) >> 2); + + nk = (key_len >> 2); + nkf = ((nrounds + 1) << 2); + br_range_dec32le(skey, (key_len >> 2), key); + tmp = skey[(key_len >> 2) - 1]; + for (i = nk, j = 0, k = 0; i < nkf; i++) { + if (j == 0) { + tmp = (tmp << 24) | (tmp >> 8); + tmp = sub_word(tmp) ^ Rcon[k]; + } else if (nk > 6 && j == 4) { + tmp = sub_word(tmp); + } + tmp ^= skey[i - nk]; + skey[i] = tmp; + if (++j == nk) { + j = 0; + k++; + } + } + + for (i = 0, j = 0; i < nkf; i += 4, j += 2) { + uint64_t q[8]; + + br_aes_ct64_interleave_in(&q[0], &q[4], skey + i); + q[1] = q[0]; + q[2] = q[0]; + q[3] = q[0]; + q[5] = q[4]; + q[6] = q[4]; + q[7] = q[4]; + br_aes_ct64_ortho(q); + comp_skey[j + 0] = + (q[0] & (uint64_t)0x1111111111111111) | (q[1] & (uint64_t)0x2222222222222222) | (q[2] & (uint64_t)0x4444444444444444) | (q[3] & (uint64_t)0x8888888888888888); + comp_skey[j + 1] = + (q[4] & (uint64_t)0x1111111111111111) | (q[5] & (uint64_t)0x2222222222222222) | (q[6] & (uint64_t)0x4444444444444444) | (q[7] & (uint64_t)0x8888888888888888); + } +} + +static void br_aes_ct64_skey_expand(uint64_t *skey, const uint64_t *comp_skey, unsigned int nrounds) { + unsigned u, v, n; + + n = (nrounds + 1) << 1; + for (u = 0, v = 0; u < n; u++, v += 4) { + uint64_t x0, x1, x2, x3; + + x0 = x1 = x2 = x3 = comp_skey[u]; + x0 &= (uint64_t)0x1111111111111111; + x1 &= (uint64_t)0x2222222222222222; + x2 &= (uint64_t)0x4444444444444444; + x3 &= (uint64_t)0x8888888888888888; + x1 >>= 1; + x2 >>= 2; + x3 >>= 3; + skey[v + 0] = (x0 << 4) - x0; + skey[v + 1] = (x1 << 4) - x1; + skey[v + 2] = (x2 << 4) - x2; + skey[v + 3] = (x3 << 4) - x3; + } +} + +static inline void add_round_key(uint64_t *q, const uint64_t *sk) { + q[0] ^= sk[0]; + q[1] ^= sk[1]; + q[2] ^= sk[2]; + q[3] ^= sk[3]; + q[4] ^= sk[4]; + q[5] ^= sk[5]; + q[6] ^= sk[6]; + q[7] ^= sk[7]; +} + +static inline void shift_rows(uint64_t *q) { + int i; + + for (i = 0; i < 8; i++) { + uint64_t x; + + x = q[i]; + q[i] = (x & (uint64_t)0x000000000000FFFF) | ((x & (uint64_t)0x00000000FFF00000) >> 4) | ((x & (uint64_t)0x00000000000F0000) << 12) | ((x & (uint64_t)0x0000FF0000000000) >> 8) | ((x & (uint64_t)0x000000FF00000000) << 8) | ((x & (uint64_t)0xF000000000000000) >> 12) | ((x & (uint64_t)0x0FFF000000000000) << 4); + } +} + +static inline uint64_t rotr32(uint64_t x) { + return (x << 32) | (x >> 32); +} + +static inline void mix_columns(uint64_t *q) { + uint64_t q0, q1, q2, q3, q4, q5, q6, q7; + uint64_t r0, r1, r2, r3, r4, r5, r6, r7; + + q0 = q[0]; + q1 = q[1]; + q2 = q[2]; + q3 = q[3]; + q4 = q[4]; + q5 = q[5]; + q6 = q[6]; + q7 = q[7]; + r0 = (q0 >> 16) | (q0 << 48); + r1 = (q1 >> 16) | (q1 << 48); + r2 = (q2 >> 16) | (q2 << 48); + r3 = (q3 >> 16) | (q3 << 48); + r4 = (q4 >> 16) | (q4 << 48); + r5 = (q5 >> 16) | (q5 << 48); + r6 = (q6 >> 16) | (q6 << 48); + r7 = (q7 >> 16) | (q7 << 48); + + q[0] = q7 ^ r7 ^ r0 ^ rotr32(q0 ^ r0); + q[1] = q0 ^ r0 ^ q7 ^ r7 ^ r1 ^ rotr32(q1 ^ r1); + q[2] = q1 ^ r1 ^ r2 ^ rotr32(q2 ^ r2); + q[3] = q2 ^ r2 ^ q7 ^ r7 ^ r3 ^ rotr32(q3 ^ r3); + q[4] = q3 ^ r3 ^ q7 ^ r7 ^ r4 ^ rotr32(q4 ^ r4); + q[5] = q4 ^ r4 ^ r5 ^ rotr32(q5 ^ r5); + q[6] = q5 ^ r5 ^ r6 ^ rotr32(q6 ^ r6); + q[7] = q6 ^ r6 ^ r7 ^ rotr32(q7 ^ r7); +} + +static void inc4_be(uint32_t *x) { + uint32_t t = br_swap32(*x) + 4; + *x = br_swap32(t); +} + +static void aes_ecb4x(unsigned char out[64], const uint32_t ivw[16], const uint64_t *sk_exp, unsigned int nrounds) { + uint32_t w[16]; + uint64_t q[8]; + unsigned int i; + + memcpy(w, ivw, sizeof(w)); + for (i = 0; i < 4; i++) { + br_aes_ct64_interleave_in(&q[i], &q[i + 4], w + (i << 2)); + } + br_aes_ct64_ortho(q); + + add_round_key(q, sk_exp); + for (i = 1; i < nrounds; i++) { + br_aes_ct64_bitslice_Sbox(q); + shift_rows(q); + mix_columns(q); + add_round_key(q, sk_exp + (i << 3)); + } + br_aes_ct64_bitslice_Sbox(q); + shift_rows(q); + add_round_key(q, sk_exp + 8 * nrounds); + + br_aes_ct64_ortho(q); + for (i = 0; i < 4; i++) { + br_aes_ct64_interleave_out(w + (i << 2), q[i], q[i + 4]); + } + br_range_enc32le(out, w, 16); +} + +static void aes_ctr4x(unsigned char out[64], uint32_t ivw[16], const uint64_t *sk_exp, unsigned int nrounds) { + aes_ecb4x(out, ivw, sk_exp, nrounds); + + /* Increase counter for next 4 blocks */ + inc4_be(ivw + 3); + inc4_be(ivw + 7); + inc4_be(ivw + 11); + inc4_be(ivw + 15); +} + +static void aes_ecb(unsigned char *out, const unsigned char *in, size_t nblocks, const uint64_t *rkeys, unsigned int nrounds) { + uint32_t blocks[16]; + unsigned char t[64]; + + while (nblocks >= 4) { + br_range_dec32le(blocks, 16, in); + aes_ecb4x(out, blocks, rkeys, nrounds); + nblocks -= 4; + in += 64; + out += 64; + } + + if (nblocks) { + br_range_dec32le(blocks, nblocks * 4, in); + aes_ecb4x(t, blocks, rkeys, nrounds); + memcpy(out, t, nblocks * 16); + } +} + +static void aes_ctr(unsigned char *out, size_t outlen, const unsigned char *iv, const uint64_t *rkeys, unsigned int nrounds) { + uint32_t ivw[16]; + size_t i; + uint32_t cc = 0; + + br_range_dec32le(ivw, 3, iv); + memcpy(ivw + 4, ivw, 3 * sizeof(uint32_t)); + memcpy(ivw + 8, ivw, 3 * sizeof(uint32_t)); + memcpy(ivw + 12, ivw, 3 * sizeof(uint32_t)); + ivw[3] = br_swap32(cc); + ivw[7] = br_swap32(cc + 1); + ivw[11] = br_swap32(cc + 2); + ivw[15] = br_swap32(cc + 3); + + while (outlen > 64) { + aes_ctr4x(out, ivw, rkeys, nrounds); + out += 64; + outlen -= 64; + } + if (outlen > 0) { + unsigned char tmp[64]; + aes_ctr4x(tmp, ivw, rkeys, nrounds); + for (i = 0; i < outlen; i++) { + out[i] = tmp[i]; + } + } +} + +void aes128_ecb_keyexp(aes128ctx *r, const unsigned char *key) { + uint64_t skey[22]; + + r->sk_exp = malloc(sizeof(uint64_t) * PQC_AES128_STATESIZE); + if (r->sk_exp == NULL) { + exit(111); + } + + br_aes_ct64_keysched(skey, key, 16); + br_aes_ct64_skey_expand(r->sk_exp, skey, 10); +} + +void aes128_ctr_keyexp(aes128ctx *r, const unsigned char *key) { + aes128_ecb_keyexp(r, key); +} + +void aes192_ecb_keyexp(aes192ctx *r, const unsigned char *key) { + uint64_t skey[26]; + r->sk_exp = malloc(sizeof(uint64_t) * PQC_AES192_STATESIZE); + if (r->sk_exp == NULL) { + exit(111); + } + + br_aes_ct64_keysched(skey, key, 24); + br_aes_ct64_skey_expand(r->sk_exp, skey, 12); +} + +void aes192_ctr_keyexp(aes192ctx *r, const unsigned char *key) { + aes192_ecb_keyexp(r, key); +} + +void aes256_ecb_keyexp(aes256ctx *r, const unsigned char *key) { + uint64_t skey[30]; + r->sk_exp = malloc(sizeof(uint64_t) * PQC_AES256_STATESIZE); + if (r->sk_exp == NULL) { + exit(111); + } + + br_aes_ct64_keysched(skey, key, 32); + br_aes_ct64_skey_expand(r->sk_exp, skey, 14); +} + +void aes256_ctr_keyexp(aes256ctx *r, const unsigned char *key) { + aes256_ecb_keyexp(r, key); +} + +void aes128_ecb(unsigned char *out, const unsigned char *in, size_t nblocks, const aes128ctx *ctx) { + aes_ecb(out, in, nblocks, ctx->sk_exp, 10); +} + +void aes128_ctr(unsigned char *out, size_t outlen, const unsigned char *iv, const aes128ctx *ctx) { + aes_ctr(out, outlen, iv, ctx->sk_exp, 10); +} + +void aes192_ecb(unsigned char *out, const unsigned char *in, size_t nblocks, const aes192ctx *ctx) { + aes_ecb(out, in, nblocks, ctx->sk_exp, 12); +} + +void aes192_ctr(unsigned char *out, size_t outlen, const unsigned char *iv, const aes192ctx *ctx) { + aes_ctr(out, outlen, iv, ctx->sk_exp, 12); +} + +void aes256_ecb(unsigned char *out, const unsigned char *in, size_t nblocks, const aes256ctx *ctx) { + aes_ecb(out, in, nblocks, ctx->sk_exp, 14); +} + +void aes256_ctr(unsigned char *out, size_t outlen, const unsigned char *iv, const aes256ctx *ctx) { + aes_ctr(out, outlen, iv, ctx->sk_exp, 14); +} + +void aes128_ctx_release(aes128ctx *r) { + free(r->sk_exp); +} + +void aes192_ctx_release(aes192ctx *r) { + free(r->sk_exp); +} + +void aes256_ctx_release(aes256ctx *r) { + free(r->sk_exp); +} diff --git a/test/nistrng/aes.h b/test/nistrng/aes.h new file mode 100644 index 000000000..80a9abd9e --- /dev/null +++ b/test/nistrng/aes.h @@ -0,0 +1,66 @@ +// SPDX-License-Identifier: MIT + +#ifndef AES_H +#define AES_H + +#include +#include + +#define AES128_KEYBYTES 16 +#define AES192_KEYBYTES 24 +#define AES256_KEYBYTES 32 +#define AESCTR_NONCEBYTES 12 +#define AES_BLOCKBYTES 16 + +// We've put these states on the heap to make sure ctx_release is used. +#define PQC_AES128_STATESIZE 88 +typedef struct { + uint64_t *sk_exp; +} aes128ctx; + +#define PQC_AES192_STATESIZE 104 +typedef struct { + uint64_t *sk_exp; +} aes192ctx; + +#define PQC_AES256_STATESIZE 120 +typedef struct { + uint64_t *sk_exp; +} aes256ctx; + +/** Initializes the context **/ +void aes128_ecb_keyexp(aes128ctx *r, const unsigned char *key); + +void aes128_ctr_keyexp(aes128ctx *r, const unsigned char *key); + +void aes128_ecb(unsigned char *out, const unsigned char *in, size_t nblocks, const aes128ctx *ctx); + +void aes128_ctr(unsigned char *out, size_t outlen, const unsigned char *iv, const aes128ctx *ctx); + +/** Frees the context **/ +void aes128_ctx_release(aes128ctx *r); + +/** Initializes the context **/ +void aes192_ecb_keyexp(aes192ctx *r, const unsigned char *key); + +void aes192_ctr_keyexp(aes192ctx *r, const unsigned char *key); + +void aes192_ecb(unsigned char *out, const unsigned char *in, size_t nblocks, const aes192ctx *ctx); + +void aes192_ctr(unsigned char *out, size_t outlen, const unsigned char *iv, const aes192ctx *ctx); + +void aes192_ctx_release(aes192ctx *r); + +/** Initializes the context **/ +void aes256_ecb_keyexp(aes256ctx *r, const unsigned char *key); + +void aes256_ctr_keyexp(aes256ctx *r, const unsigned char *key); + +void aes256_ecb(unsigned char *out, const unsigned char *in, size_t nblocks, const aes256ctx *ctx); + +void aes256_ctr(unsigned char *out, size_t outlen, const unsigned char *iv, const aes256ctx *ctx); + +/** Frees the context **/ +void aes256_ctx_release(aes256ctx *r); + +#endif diff --git a/test/nistrng/randombytes.h b/test/nistrng/randombytes.h new file mode 100644 index 000000000..0e5499cb7 --- /dev/null +++ b/test/nistrng/randombytes.h @@ -0,0 +1,13 @@ +// SPDX-License-Identifier: Apache-2.0 + +#ifndef RANDOMBYTES_H +#define RANDOMBYTES_H + +#include +#include "aes.h" + +void randombytes(uint8_t *buf, size_t n); + +void nist_kat_init(unsigned char entropy_input[AES256_KEYBYTES + AES_BLOCKBYTES], const unsigned char personalization_string[AES256_KEYBYTES + AES_BLOCKBYTES], int security_strength); + +#endif diff --git a/test/nistrng/rng.c b/test/nistrng/rng.c new file mode 100644 index 000000000..f8524d654 --- /dev/null +++ b/test/nistrng/rng.c @@ -0,0 +1,87 @@ +// SPDX-License-Identifier: Apache-2.0 + +#include +#include +#include + +#include "aes.h" +#include "randombytes.h" + +typedef struct { + unsigned char key[AES256_KEYBYTES]; + unsigned char ctr[AES_BLOCKBYTES]; +} nistkatctx; + +static nistkatctx ctx; + +static void _aes256_ecb(unsigned char key[AES256_KEYBYTES], unsigned char ctr[AES_BLOCKBYTES], unsigned char buffer[AES_BLOCKBYTES]) { + aes256ctx aesctx; + aes256_ecb_keyexp(&aesctx, key); + aes256_ecb(buffer, ctr, 1, &aesctx); + aes256_ctx_release(&aesctx); +} + +static void aes256_block_update(uint8_t block[AES_BLOCKBYTES]) { + for (int j = AES_BLOCKBYTES - 1; j >= 0; j--) { + ctx.ctr[j]++; + + if (ctx.ctr[j] != 0x00) { + break; + } + } + + _aes256_ecb(ctx.key, ctx.ctr, block); +} + +static void nistkat_update(const unsigned char *provided_data, unsigned char *key, unsigned char *ctr) { + int len = AES256_KEYBYTES + AES_BLOCKBYTES; + uint8_t tmp[len]; + + for (int i = 0; i < len / AES_BLOCKBYTES; i++) { + aes256_block_update(tmp + AES_BLOCKBYTES * i); + } + + if (provided_data) { + for (int i = 0; i < len; i++) { + tmp[i] ^= provided_data[i]; + } + } + + memcpy(key, tmp, AES256_KEYBYTES); + memcpy(ctr, tmp + AES256_KEYBYTES, AES_BLOCKBYTES); +} + +void nist_kat_init(unsigned char entropy_input[AES256_KEYBYTES + AES_BLOCKBYTES], const unsigned char personalization_string[AES256_KEYBYTES + AES_BLOCKBYTES], int security_strength) { + int len = AES256_KEYBYTES + AES_BLOCKBYTES; + uint8_t seed_material[len]; + (void) security_strength; + + memcpy(seed_material, entropy_input, len); + if (personalization_string) { + for (int i = 0; i < len; i++) { + seed_material[i] ^= personalization_string[i]; + } + } + memset(ctx.key, 0x00, AES256_KEYBYTES); + memset(ctx.ctr, 0x00, AES_BLOCKBYTES); + nistkat_update(seed_material, ctx.key, ctx.ctr); +} + +void randombytes(uint8_t *buf, size_t n) { + uint8_t block[AES_BLOCKBYTES]; + + size_t nb = n / AES_BLOCKBYTES; + size_t tail = n % AES_BLOCKBYTES; + + for (size_t i = 0; i < nb; i++) { + aes256_block_update(block); + memcpy(buf + i * AES_BLOCKBYTES, block, AES_BLOCKBYTES); + } + + if (tail > 0) { + aes256_block_update(block); + memcpy(buf + nb * AES_BLOCKBYTES, block, tail); + } + + nistkat_update(NULL, ctx.key, ctx.ctr); +} From 34ee166d9e3b3d1b07a6a91385ac7ef537f87b53 Mon Sep 17 00:00:00 2001 From: "Matthias J. Kannwischer" Date: Thu, 9 May 2024 13:15:07 +0800 Subject: [PATCH 3/9] Update README.md Co-authored-by: Hanno Becker Signed-off-by: Matthias J. Kannwischer --- README.md | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index f4af653d5..94d76a784 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,20 @@ [//]: # (SPDX-License-Identifier: CC-BY-4.0) -**MLKEM-C-AArch64** is a collection of [MLKEM](https://doi.org/10.6028/NIST.FIPS.203.ipd) implementations optimized for number of different Armv8-A and Armv9-A microarchitectures. +**MLKEM-C-AArch64** is a collection of [MLKEM](https://doi.org/10.6028/NIST.FIPS.203.ipd) implementations for CPUs based on the Armv8-A and Armv9-A architectures. + +There is a wide spectrum of implementations of the Armv8-A and Armv9-A architectures, ranging from efficiency-focused in-order cores to performance-centric highly out-of-order cores. Depending on a CPU's placement on this spectrum, its optimal MLKEM implementation will vary: Code that performs well on Apple M1 may not perform well on Cortex-A55, or vice versa. + +MLKEM-C-AArch64 aims to provide a portfolio of implementations covering most Armv8-A/Armv9-A microarchitectures, plus code optimized for specific microarchitectures. + +Initially, our benchmarking platforms are: +- Arm Cortex-A53 (as used in the Raspberry Pi3) +- Arm Cortex-A55 +- Arm Cortex-A72 (as used in the Raspberry Pi4) +- Arm Cortex-A76 (as used in the Raspberry Pi5) / Neoverse N1 (as used in AWS Graviton2/c6g instances) +- Arm Neoverse-V1 (as used in the AWS Graviton3/c7g instances) +- Apple M1 + +Please reach out to the MLKEM-C-AArch64 maintainers or open an issue if you would like to see benchmarking on other microarchitectures. Initially the primary target platforms are: - Arm Cortex-A72 (as used in the Raspberry Pi4) From 6e1fe80b37375923db2d9251db9b7178e4963f8f Mon Sep 17 00:00:00 2001 From: "Matthias J. Kannwischer" Date: Thu, 9 May 2024 13:15:23 +0800 Subject: [PATCH 4/9] Update README.md Co-authored-by: Hanno Becker Signed-off-by: Matthias J. Kannwischer --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 94d76a784..9201be783 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,7 @@ The goals of this project are as follows: **MLKEM-C-AArch64** is currently a work in progress and we do not recommend relying on it at this point. **WE DO NOT CURRENTLY RECOMMEND RELYING ON THIS LIBRARY IN A PRODUCTION ENVIRONMENT OR TO PROTECT ANY SENSITIVE DATA.** -Once we have the first stable version, this notice will be removed +Once we have the first stable version, this notice will be removed. The current code is compatible with the [`standard` branch of the official MLKEM repository](https://github.com/pq-crystals/kyber/tree/standard). From 8885a3bb1a2e0ed87628ac656df3d2a3f67161a7 Mon Sep 17 00:00:00 2001 From: "Matthias J. Kannwischer" Date: Thu, 9 May 2024 13:17:31 +0800 Subject: [PATCH 5/9] Update README.md Signed-off-by: Matthias J. Kannwischer --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 9201be783..27a0a847c 100644 --- a/README.md +++ b/README.md @@ -33,7 +33,7 @@ The goals of this project are as follows: - Achieve performance matching the state-of-the-art on the target platforms. - Maintainability should not be sacrificed and assembly should be as readable as possible. We make use of automated tooling for microarchitecture-specific optimization (e.g., by using [SLOTHY](https://slothy-optimizer.github.io/slothy/)). - Provide a unified interface for Keccak implementations allowing 2-way, 4-way, and 8-way parallel implementations depending on the target microarchitecture. -- Eventually, we aim to unify the implementations with the implementations in [mlkem-c-generic](https://github.com/pq-code-package/mlkem-c-generic). However, we believe that for AArch64, there are too relevant microarchitectures to come up with a single implementation that performs well on all. +- Eventually, we aim to unify the implementations with the implementations in [mlkem-c-generic](https://github.com/pq-code-package/mlkem-c-generic). However, we believe that for AArch64, there are too many relevant microarchitectures to come up with a single implementation that performs well on all. ## Current state From e1ce379d7ad8d371a44c8f49e6dcb8096fd7be79 Mon Sep 17 00:00:00 2001 From: "Matthias J. Kannwischer" Date: Thu, 9 May 2024 19:21:55 +0800 Subject: [PATCH 6/9] Update README.md Co-authored-by: Hanno Becker Signed-off-by: Matthias J. Kannwischer --- README.md | 37 +++++++++++++++++-------------------- 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/README.md b/README.md index 27a0a847c..d4d37f6b8 100644 --- a/README.md +++ b/README.md @@ -2,38 +2,35 @@ **MLKEM-C-AArch64** is a collection of [MLKEM](https://doi.org/10.6028/NIST.FIPS.203.ipd) implementations for CPUs based on the Armv8-A and Armv9-A architectures. -There is a wide spectrum of implementations of the Armv8-A and Armv9-A architectures, ranging from efficiency-focused in-order cores to performance-centric highly out-of-order cores. Depending on a CPU's placement on this spectrum, its optimal MLKEM implementation will vary: Code that performs well on Apple M1 may not perform well on Cortex-A55, or vice versa. +## Goals of MLKEM-C-AArch64 + +The primary goals of this project are as follows: +- _Assurance:_ Clean code that's extensively tested and amenable for audit and verification +- _Ease of use:_ Permissive licensing, modularity, few dependencies +- _Performance:_ Competitive performance for most Armv8-A/Armv9-A platforms + +There are tensions between these goals: +- Optimal code is target-specific, but a large variety of CPU-specific implementations makes a library harder to both use and maintain. +- Optimal code is complex (e.g. relying on handwritten assembly), impeding maintainenance and amenability for audit or verification. -MLKEM-C-AArch64 aims to provide a portfolio of implementations covering most Armv8-A/Armv9-A microarchitectures, plus code optimized for specific microarchitectures. +In doubt, MLKEM-C-AArch64 chooses assurance and ease of use over performance: We only include implementations into MLKEM-C-AArch64 which are manually auditable or (ideally _and_) for which we see a path towards formal verification. We prefer assembly over intrinsics (for better control over register allocation and instruction scheduling), but all assembly should be as readable as possible and micro-optimization ideally deferred to automated tooling such as [SLOTHY](https://slothy-optimizer.github.io/slothy/). Ultimately, MLKEM-C-AArch64 strives for constant-time implementations for which the C-code is, at minimum, verified to be free of undefined behaviour, and where all assembly is functionally verified. -Initially, our benchmarking platforms are: -- Arm Cortex-A53 (as used in the Raspberry Pi3) +MLKEM-C-AArch64 aims to provide a portfolio of implementations jointly providing competitive performance for most Armv8-A/Armv9-A microarchitectures. For some specific microarchitectures of particular interest, MLKEM-C-AArch64 may also provide CPU-specific implementations. Initially, our benchmarking platforms are: - Arm Cortex-A55 - Arm Cortex-A72 (as used in the Raspberry Pi4) - Arm Cortex-A76 (as used in the Raspberry Pi5) / Neoverse N1 (as used in AWS Graviton2/c6g instances) -- Arm Neoverse-V1 (as used in the AWS Graviton3/c7g instances) +- Arm Neoverse V1 (as used in the AWS Graviton3/c7g instances) - Apple M1 Please reach out to the MLKEM-C-AArch64 maintainers or open an issue if you would like to see benchmarking on other microarchitectures. -Initially the primary target platforms are: - - Arm Cortex-A72 (as used in the Raspberry Pi4) - - Apple M1 - - AWS [Graviton 4](https://press.aboutamazon.com/2023/11/aws-unveils-next-generation-aws-designed-chips) instances based on [Arm Neoverse V2](https://developer.arm.com/Processors/Neoverse%20V2) +## Non-goals +At this point, we do not provide implementations optimized for memory usage (code / RAM). If you need a memory-optimized implementation and the implementation provided by MLKEM-C-Generic is not of sufficient performance to your application, please contact us. -## Goals of MLKEM-C-AArch64 - -The goals of this project are as follows: +## Relation to MLKEM-C-Generic -- Provide production-grade code that can be dropped into other projects. -- Being permissibly licensed with all code coming with an Apache-2.0 license. -- Tested against the official reference known-answer tests (KATs) and extended KATs (taken from another [PQCP](https://github.com/pq-code-package) project). -- Include Neon assembly implementations of the core building blocks of MLKEM performing well on a wide range of Armv8-A and Armv9-A platforms. -- Achieve performance matching the state-of-the-art on the target platforms. -- Maintainability should not be sacrificed and assembly should be as readable as possible. We make use of automated tooling for microarchitecture-specific optimization (e.g., by using [SLOTHY](https://slothy-optimizer.github.io/slothy/)). -- Provide a unified interface for Keccak implementations allowing 2-way, 4-way, and 8-way parallel implementations depending on the target microarchitecture. -- Eventually, we aim to unify the implementations with the implementations in [mlkem-c-generic](https://github.com/pq-code-package/mlkem-c-generic). However, we believe that for AArch64, there are too many relevant microarchitectures to come up with a single implementation that performs well on all. +Eventually, we aim to unify the (shared) C-part of the implementations provided by MLKEM-C-AArch64 with the implementations in [mlkem-c-generic](https://github.com/pq-code-package/mlkem-c-generic). Initially, however, we will allow some divergence, e.g. to explore interfaces to 2-/4-/8-way parallel Keccak implementations which are essential for high-performance implementations of MLKEM. ## Current state From 143efe167ffac32651ced507206074b340e79308 Mon Sep 17 00:00:00 2001 From: "Matthias J. Kannwischer" Date: Thu, 9 May 2024 18:24:13 +0800 Subject: [PATCH 7/9] Update MAINTAINERS.md (#32) * Add Matthias and Hanno as maintainers Signed-off-by: Matthias J. Kannwischer * Add Duc Tri Nguyen to MAINTAINERS.md Co-authored-by: cothan Signed-off-by: Matthias J. Kannwischer --------- Signed-off-by: Matthias J. Kannwischer Co-authored-by: cothan --- MAINTAINERS.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/MAINTAINERS.md b/MAINTAINERS.md index 11d2d119c..d18fb02e5 100644 --- a/MAINTAINERS.md +++ b/MAINTAINERS.md @@ -1,5 +1,4 @@ [//]: # (SPDX-License-Identifier: CC-BY-4.0) -[//]: # (TODO Update list of maintainers) # Maintainers @@ -7,5 +6,6 @@ | Name | GitHub | Chat | Affliation |-------------------------|-------------------------------------------|----------------|---------------------- -| Nigel Jones | [planetf1](https://github.com/planetf1) | planetf1 | IBM - +| Hanno Becker | [hanno-becker](https://github.com/hanno-becker) | | AWS | +| Matthias J. Kannwischer | [mkannwischer](https://github.com/mkannwischer) | matthiaskannwischer | Chelpis Quantum Tech | +| Duc Tri Nguyen | [cothan](https://github.com/cothan) | cothan | CERG @ GMU | From cecd5e10bbe894ad7fcd897577d83d0a58c46e45 Mon Sep 17 00:00:00 2001 From: "Matthias J. Kannwischer" Date: Thu, 9 May 2024 19:23:59 +0800 Subject: [PATCH 8/9] Update README.md Signed-off-by: Matthias J. Kannwischer --- README.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index d4d37f6b8..c7b2eab68 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ [//]: # (SPDX-License-Identifier: CC-BY-4.0) -**MLKEM-C-AArch64** is a collection of [MLKEM](https://doi.org/10.6028/NIST.FIPS.203.ipd) implementations for CPUs based on the Armv8-A and Armv9-A architectures. +**mlkem-c-aarch64** is a collection of [MLKEM](https://doi.org/10.6028/NIST.FIPS.203.ipd) implementations for CPUs based on the Armv8-A and Armv9-A architectures. -## Goals of MLKEM-C-AArch64 +## Goals of mlkem-c-aarch64 The primary goals of this project are as follows: - _Assurance:_ Clean code that's extensively tested and amenable for audit and verification @@ -13,16 +13,16 @@ There are tensions between these goals: - Optimal code is target-specific, but a large variety of CPU-specific implementations makes a library harder to both use and maintain. - Optimal code is complex (e.g. relying on handwritten assembly), impeding maintainenance and amenability for audit or verification. -In doubt, MLKEM-C-AArch64 chooses assurance and ease of use over performance: We only include implementations into MLKEM-C-AArch64 which are manually auditable or (ideally _and_) for which we see a path towards formal verification. We prefer assembly over intrinsics (for better control over register allocation and instruction scheduling), but all assembly should be as readable as possible and micro-optimization ideally deferred to automated tooling such as [SLOTHY](https://slothy-optimizer.github.io/slothy/). Ultimately, MLKEM-C-AArch64 strives for constant-time implementations for which the C-code is, at minimum, verified to be free of undefined behaviour, and where all assembly is functionally verified. +In doubt, **mlkem-c-aarch64** chooses assurance and ease of use over performance: We only include implementations into **mlkem-c-aarch64** which are manually auditable or (ideally _and_) for which we see a path towards formal verification. We prefer assembly over intrinsics (for better control over register allocation and instruction scheduling), but all assembly should be as readable as possible and micro-optimization ideally deferred to automated tooling such as [SLOTHY](https://slothy-optimizer.github.io/slothy/). Ultimately, **mlkem-c-aarch64** strives for constant-time implementations for which the C-code is, at minimum, verified to be free of undefined behaviour, and where all assembly is functionally verified. -MLKEM-C-AArch64 aims to provide a portfolio of implementations jointly providing competitive performance for most Armv8-A/Armv9-A microarchitectures. For some specific microarchitectures of particular interest, MLKEM-C-AArch64 may also provide CPU-specific implementations. Initially, our benchmarking platforms are: +**mlkem-c-aarch64** aims to provide a portfolio of implementations jointly providing competitive performance for most Armv8-A/Armv9-A microarchitectures. For some specific microarchitectures of particular interest, **mlkem-c-aarch64** may also provide CPU-specific implementations. Initially, our benchmarking platforms are: - Arm Cortex-A55 - Arm Cortex-A72 (as used in the Raspberry Pi4) - Arm Cortex-A76 (as used in the Raspberry Pi5) / Neoverse N1 (as used in AWS Graviton2/c6g instances) - Arm Neoverse V1 (as used in the AWS Graviton3/c7g instances) - Apple M1 -Please reach out to the MLKEM-C-AArch64 maintainers or open an issue if you would like to see benchmarking on other microarchitectures. +Please reach out to the **mlkem-c-aarch64** maintainers or open an issue if you would like to see benchmarking on other microarchitectures. ## Non-goals @@ -30,12 +30,12 @@ At this point, we do not provide implementations optimized for memory usage (cod ## Relation to MLKEM-C-Generic -Eventually, we aim to unify the (shared) C-part of the implementations provided by MLKEM-C-AArch64 with the implementations in [mlkem-c-generic](https://github.com/pq-code-package/mlkem-c-generic). Initially, however, we will allow some divergence, e.g. to explore interfaces to 2-/4-/8-way parallel Keccak implementations which are essential for high-performance implementations of MLKEM. +Eventually, we aim to unify the (shared) C-part of the implementations provided by **mlkem-c-aarch64** with the implementations in [mlkem-c-generic](https://github.com/pq-code-package/mlkem-c-generic). Initially, however, we will allow some divergence, e.g. to explore interfaces to 2-/4-/8-way parallel Keccak implementations which are essential for high-performance implementations of MLKEM. ## Current state -**MLKEM-C-AArch64** is currently a work in progress and we do not recommend relying on it at this point. +**mlkem-c-aarch64** is currently a work in progress and we do not recommend relying on it at this point. **WE DO NOT CURRENTLY RECOMMEND RELYING ON THIS LIBRARY IN A PRODUCTION ENVIRONMENT OR TO PROTECT ANY SENSITIVE DATA.** Once we have the first stable version, this notice will be removed. @@ -43,11 +43,11 @@ The current code is compatible with the [`standard` branch of the official MLKEM ## Call for contributors -We are actively seeking contributors who can help us build **MLKEM-C-AArch64**. +We are actively seeking contributors who can help us build **mlkem-c-aarch64**. If you are interested, please contact us, or volunteer for any of the open issues. ## Call for potential consumers -If you are a potential consumer of **MLKEM-C-AArch64**, please reach out to us. -We're interested in hearing the way you are considering using **MLKEM-C-AArch64** and could benefit from additional features. +If you are a potential consumer of **mlkem-c-aarch64**, please reach out to us. +We're interested in hearing the way you are considering using **mlkem-c-aarch64** and could benefit from additional features. If you have specific feature requests, please open an issue. From 44aa01e07b84e734c90c677d14896175e6f02a53 Mon Sep 17 00:00:00 2001 From: "Matthias J. Kannwischer" Date: Thu, 9 May 2024 19:48:36 +0800 Subject: [PATCH 9/9] Update README.md Signed-off-by: Matthias J. Kannwischer --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c7b2eab68..f3df441ba 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ There are tensions between these goals: - Optimal code is target-specific, but a large variety of CPU-specific implementations makes a library harder to both use and maintain. - Optimal code is complex (e.g. relying on handwritten assembly), impeding maintainenance and amenability for audit or verification. -In doubt, **mlkem-c-aarch64** chooses assurance and ease of use over performance: We only include implementations into **mlkem-c-aarch64** which are manually auditable or (ideally _and_) for which we see a path towards formal verification. We prefer assembly over intrinsics (for better control over register allocation and instruction scheduling), but all assembly should be as readable as possible and micro-optimization ideally deferred to automated tooling such as [SLOTHY](https://slothy-optimizer.github.io/slothy/). Ultimately, **mlkem-c-aarch64** strives for constant-time implementations for which the C-code is, at minimum, verified to be free of undefined behaviour, and where all assembly is functionally verified. +In doubt, **mlkem-c-aarch64** chooses assurance and ease of use over performance: We only include implementations into **mlkem-c-aarch64** which are manually auditable or (ideally _and_) for which we see a path towards formal verification. All assembly should be as readable as possible and micro-optimization ideally deferred to automated tooling such as [SLOTHY](https://slothy-optimizer.github.io/slothy/). Ultimately, **mlkem-c-aarch64** strives for constant-time implementations for which the C-code is, at minimum, verified to be free of undefined behaviour, and where all assembly is functionally verified. **mlkem-c-aarch64** aims to provide a portfolio of implementations jointly providing competitive performance for most Armv8-A/Armv9-A microarchitectures. For some specific microarchitectures of particular interest, **mlkem-c-aarch64** may also provide CPU-specific implementations. Initially, our benchmarking platforms are: - Arm Cortex-A55