Skip to content

Commit

Permalink
Add equivalence tester
Browse files Browse the repository at this point in the history
By default, SLOTHY includes a 'selfcheck' verifying that the DFGs
of input and output are isomorphic with respect to the rescheduling
found by SLOTHY. This selfcheck is very powerful and should catch most
modelling errors in SLOTHY.

However, some aspects of SLOTHY remain outside of the scope of the
DFG-based selfcheck:

- Address offset fixup

  This is a historically rather error prone feature to which the self-check
  is blind (since it relies on making the DFG blind to address increments).

- Loop boundaries

  If SW pipelining is enabled, the selfcheck checks equivalence of a
  _bounded unrolling_ of the input and output loops.

  Therefore, it would not catch issues with (a) the loop boundary, or
  (b) exceptional iterations for low loop counts.

While it is not SLOTHY's goal to provide formal guarantees of correctness
of the optimizations -- this should be gauged/confirmed through extensive test
and ideally formal verification -- development would be faster if bugs
in the aforementioned areas were caught before the developer integrates
and tests the SLOTHY output manually.

As a step towards catching more potential pitfalls early in the SLOTHY
workflow, this commit starts the development of an _equivalence tester_.

The idea of the equivalence tester is simple: Just run original and optimized
assembly with various random inputs, and check that they indeed lead to
equivalent outputs.

This commit implements this idea for AArch64 and Armv7-M, leveraging
LLVM-MC as an assembler and the unicorn-engine for emulation.

An initial limitation is support for loops: For now, the equivalence
checker tests bounded unrollings of loops, just like the selfcheck. That is
against the point of the equivalenc check, but it will be improved at a later
point, after some refactoring of the loop generation code.

The checker added in this commit is able to detect issues with address offset
fixup, however, which would previously have gone unnoticed by the selfcheck.

The new equivalence test is configured via config.selftest, which is enabled
by default, but silently skipped for unsupported platforms or target architectures.
  • Loading branch information
hanno-becker committed Dec 8, 2024
1 parent 2ebb278 commit c96bc14
Show file tree
Hide file tree
Showing 8 changed files with 422 additions and 52 deletions.
46 changes: 9 additions & 37 deletions .github/workflows/test_basic.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,12 @@ jobs:
github.event.pull_request.user.login == 'mkannwischer'
}}
runs-on: ubuntu-latest
strategy:
matrix:
strategy:
matrix:
target: [slothy.targets.arm_v7m.cortex_m7,slothy.targets.arm_v81m.cortex_m55r1, slothy.targets.arm_v81m.cortex_m85r1, slothy.targets.aarch64.cortex_a55, slothy.targets.aarch64.cortex_a72_frontend, slothy.targets.aarch64.apple_m1_firestorm_experimental, slothy.targets.aarch64.apple_m1_icestorm_experimental]
steps:
- uses: actions/checkout@v3
- name: Install python dependencies
run: |
python3 -m venv venv
./venv/bin/python3 -m pip install -r requirements.txt
echo BASH_ENV="./venv/bin/activate" >> $GITHUB_ENV
- uses: ./.github/actions/setup-ubuntu
- name: Run examples
run: |
python3 example.py --dry-run --only-target=${{ matrix.target }}
Expand All @@ -34,11 +30,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install python dependencies
run: |
python3 -m venv venv
./venv/bin/python3 -m pip install -r requirements.txt
echo BASH_ENV="./venv/bin/activate" >> $GITHUB_ENV
- uses: ./.github/actions/setup-ubuntu
- name: Run tutorial
run: |
(cd tutorial && ./tutorial_all.sh)
Expand All @@ -51,11 +43,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install python dependencies
run: |
python3 -m venv venv
./venv/bin/python3 -m pip install -r requirements.txt
echo BASH_ENV="./venv/bin/activate" >> $GITHUB_ENV
- uses: ./.github/actions/setup-ubuntu
- name: Run examples
run: |
python3 example.py --examples simple0,simple1,simple0_loop,simple1_loop
Expand All @@ -68,11 +56,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install python dependencies
run: |
python3 -m venv venv
./venv/bin/python3 -m pip install -r requirements.txt
echo BASH_ENV="./venv/bin/activate" >> $GITHUB_ENV
- uses: ./.github/actions/setup-ubuntu
- name: Run examples
run: |
python3 example.py --examples ntt_kyber_1_23_45_67_m55,ntt_dilithium_12_34_56_78_m55 --timeout=300
Expand All @@ -85,11 +69,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install python dependencies
run: |
python3 -m venv venv
./venv/bin/python3 -m pip install -r requirements.txt
echo BASH_ENV="./venv/bin/activate" >> $GITHUB_ENV
- uses: ./.github/actions/setup-ubuntu
- name: Run examples
run: |
python3 example.py --examples ntt_kyber_123_4567_a55,ntt_dilithium_123_45678_a55 --timeout=300
Expand All @@ -102,11 +82,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install python dependencies
run: |
python3 -m venv venv
./venv/bin/python3 -m pip install -r requirements.txt
echo BASH_ENV="./venv/bin/activate" >> $GITHUB_ENV
- uses: ./.github/actions/setup-ubuntu
- name: Run examples
run: |
(cd paper/scripts && NO_LOG=Y ./slothy_sqmag.sh)
Expand All @@ -119,11 +95,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install python dependencies
run: |
python3 -m venv venv
./venv/bin/python3 -m pip install -r requirements.txt
echo BASH_ENV="./venv/bin/activate" >> $GITHUB_ENV
- uses: ./.github/actions/setup-ubuntu
- name: Run examples
run: |
(cd paper/scripts && NO_LOG=Y ./slothy_fft.sh)
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@ numpy==1.26.4
ortools==9.7.2996
pandas==2.1.1
sympy==1.12
unicorn=2.1.1
75 changes: 75 additions & 0 deletions slothy/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,52 @@ def reserved_regs_are_locked(self):
to eliminate uses of particular registers from some piece of assembly."""
return self._reserved_regs_are_locked

@property
def selftest(self):
"""Indicates whether SLOTHY performs an empirical equivalence-test on the
optimization results.
When this is set, and if the target architecture and host platform support it,
this will run an empirical equivalence checker trying to confirm that the
input and output of SLOTHY are likely functionally equivalent.
The primary purpose of this checker is to detect issue that would presently
be overlooked by the selfcheck:
- The selfcheck is currently blind to address offset fixup. If something goes
wrong, the input and output will not be functionally equivalent, but we would
only notice once we actually compile and run the code. The selftest will
likely catch issues.
- When using software pipelining, the selfcheck reduces to a straightline check
for a bounded unrolling of the loop. An unbounded selfcheck is currently not
implemented.
With the selftest, you still need to fix a loop bound, but at least you can
equivalence-check the loop-form (including the compare+branch instructions
at the loop boundary) rather than the unrolled code.
DEPENDENCY: To run this, you need `llvm-mc` the binary in your path or configured
as via `llvm_mc_binary`, and `unicorn-engine` Python bindings setup.
NOTE: This is so far implemented as a repeated randomized test -- nothing clever.
"""
return self._selftest

@property
def selftest_iterations(self):
"""If selftest is set, indicates the number of random selftest to conduct"""
return self._selftest_iterations

@property
def selftest_address_gprs(self):
"""Dictionary of (reg, sz) items indicating which registers are assumed to be
pointers to memory, and if so, of what size."""
return self._selftest_address_gprs

@property
def selftest_default_memory_size(self):
"""Default buffer size to use for registers which are automatically inferred to be
used as pointers and for which no memory size has been configured via `address_gprs`."""
return self._selftest_default_memory_size

@property
def selfcheck(self):
"""Indicates whether SLOTHY performs a self-check on the optimization result.
Expand Down Expand Up @@ -431,6 +477,13 @@ def llvm_mca_binary(self):
is set."""
return self._llvm_mca_binary

@property
def llvm_mc_binary(self):
"""The llvm-mc binary to be used for assembling output data
This is only relevant if `selftest` is set."""
return self._llvm_mc_binary

@property
def timeout(self):
"""The timeout in seconds after which the underlying constraint solver stops
Expand Down Expand Up @@ -1143,6 +1196,10 @@ def __init__(self, Arch, Target):
self._reserved_regs = None
self._reserved_regs_are_locked = True

self._selftest = True
self._selftest_iterations = 10
self._selftest_address_gprs = None
self._selftest_default_memory_size = 1024
self._selfcheck = True
self._selfcheck_failure_logfile = None
self._allow_useless_instructions = False
Expand Down Expand Up @@ -1172,6 +1229,7 @@ def __init__(self, Arch, Target):
self._compiler_binary = "gcc"
self._compiler_include_paths = None
self._llvm_mca_binary = "llvm-mca"
self._llvm_mc_binary = "llvm-mc"

self.keep_tags = True
self.inherit_macro_comments = False
Expand Down Expand Up @@ -1260,6 +1318,20 @@ def reserved_regs_are_locked(self,val):
@variable_size.setter
def variable_size(self,val):
self._variable_size = val
@selftest.setter
def selftest(self,val):
if hasattr(self.arch, "Checker") is False:
raise InvalidConfig("Trying to enable checker, but architecture model does not seem to support it")
self._selftest = val
@selftest_iterations.setter
def selftest_iterations(self,val):
self._selftest_iterations = val
@selftest_address_gprs.setter
def selftest_address_gprs(self,val):
self._selftest_address_gprs = val
@selftest_default_memory_size.setter
def selftest_default_memory_size(self,val):
self._selftest_default_memory_size = val
@selfcheck.setter
def selfcheck(self,val):
self._selfcheck = val
Expand Down Expand Up @@ -1308,6 +1380,9 @@ def compiler_include_paths(self, val):
@llvm_mca_binary.setter
def llvm_mca_binary(self, val):
self._llvm_mca_binary = val
@llvm_mc_binary.setter
def llvm_mc_binary(self, val):
self._llvm_mc_binary = val
@timeout.setter
def timeout(self, val):
self._timeout = val
Expand Down
130 changes: 128 additions & 2 deletions slothy/core/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@

import logging
import math
import os
import time

from types import SimpleNamespace
Expand All @@ -38,7 +39,12 @@
from ortools.sat.python import cp_model

from slothy.core.config import Config
from slothy.helper import LockAttributes, Permutation, DeferHandler, SourceLine
from slothy.helper import LockAttributes, Permutation, DeferHandler, SourceLine, LLVM_Mc

try:
from unicorn import Uc
except ImportError:
Uc = None

from slothy.core.dataflow import DataFlowGraph as DFG
from slothy.core.dataflow import Config as DFGConfig
Expand Down Expand Up @@ -818,6 +824,121 @@ def selfcheck(self, log):
raise SlothySelfCheckException("Isomorphism between computation flow graphs: FAIL!")
return res

def selftest(self, log):
"""Run empirical self test, if it exists for the target architecture"""
if self._config.selftest is False:
return

if Uc is None:
raise SlothySelfTestException("Cannot run selftest -- unicorn-engine is not available.")

if self._config.arch.unicorn_arch is None or \
self._config.arch.llvm_mc_arch is None:
log.warning("Selftest not supported on target architecture")
return

log.info(f"Running selftest ({self._config.selftest_iterations} iterations)...")

address_gprs = self._config.selftest_address_gprs
if address_gprs is None:
# Try to infer which registes need to be pointers
log_addresses = log.getChild("infer_address_gprs")
tree = DFG(self._orig_code, log_addresses, DFGConfig(self.config, outputs=self.outputs))
# Look for load/store instructions and remember addresses
addresses = set()
for t in tree.nodes:
addr = getattr(t.inst, "addr", None)
if addr is None:
continue
addresses.add(addr)

# For now, we don't look into increments and immedate offsets
# to gauge the amount of memory we actually need. Instaed, we
# just allocate a buffer of a configurable default size.
log.info(f"Inferred that the following registers seem to act as pointers: {addresses}")
log.info(f"Using default buffer size of {self._config.selftest_default_memory_size} bytes. "
"If you want different buffer sizes, set selftest_address_gprs manually.")
address_gprs = { a: self._config.selftest_default_memory_size for a in addresses }

# This produces _unrolled_ code, the same that is checked in the selfcheck.
# The selftest should instead use the rolled form of the loop.
iterations = 7
if self.config.sw_pipelining.enabled is True:
old_source, new_source = self.get_fully_unrolled_loop(iterations)
else:
old_source, new_source = self.orig_code, self.code

CODE_BASE = 0x010000
CODE_SZ = 0x010000
RAM_BASE = 0x020000
RAM_SZ = 0x010000

regs = [r for ty in self._config.arch.RegisterType for r in \
self._config.arch.RegisterType.list_registers(ty)]

def run_code(code, txt=None):
objcode = LLVM_Mc.assemble(code, self._config.llvm_mc_binary,
self._config.arch.llvm_mc_arch,
self._config.arch.llvm_mc_attr,
log)

# Setup emulator
mu = Uc(self.config.arch.unicorn_arch, self.config.arch.unicorn_mode)
# Copy initial register contents into emulator
for r,v in initial_register_contents.items():
ur = self._config.arch.RegisterType.unicorn_reg_by_name(r)
if ur is None:
continue
mu.reg_write(ur, v)
# Copy code into emulator
mu.mem_map(CODE_BASE, CODE_SZ)
mu.mem_write(CODE_BASE, objcode)
# Copy initial memory contents into emulator
mu.mem_map(RAM_BASE, RAM_SZ)
mu.mem_write(RAM_BASE, initial_memory)
# Run emulator
mu.emu_start(CODE_BASE, CODE_BASE + len(objcode))

final_register_contents = {}
for r in regs:
ur = self._config.arch.RegisterType.unicorn_reg_by_name(r)
if ur is None:
continue
final_register_contents[r] = mu.reg_read(ur)
final_memory_contents = mu.mem_read(RAM_BASE, RAM_SZ)

return final_register_contents, final_memory_contents

for _ in range(self._config.selftest_iterations):
initial_memory = os.urandom(RAM_SZ)
cur_ram = RAM_BASE
# Set initial register contents arbitrarily, except for registers
# which must hold valid memory addresses.
initial_register_contents = {}
for r in regs:
initial_register_contents[r] = int.from_bytes(os.urandom(16))
for (reg, sz) in address_gprs.items():
initial_register_contents[reg] = cur_ram
cur_ram += sz

final_regs_old, final_mem_old = run_code(old_source, txt="old")
final_regs_new, final_mem_new = run_code(new_source, txt="new")

# Check if memory contents are the same
if final_mem_old != final_mem_new:
raise SlothySelfTestException(f"Selftest failed: Memory mismatch")

# Check if register contents are the same
regs_expected = set(self.config.outputs).union(self.config.reserved_regs)
# Ignore hint registers, flags and sp for now
regs_expected = set(filter(lambda t: t.startswith("t") is False and
t != "sp" and t != "flags", regs_expected))
for r in regs_expected:
if final_regs_old[r] != final_regs_new[r]:
raise SlothySelfTestException(f"Selftest failed: Register mismatch for {r}: {hex(final_regs_old[r])} != {hex(final_regs_new[r])}")

log.info("Selftest: OK")

def selfcheck_with_fixup(self, log):
"""Do selfcheck, and consider preamble/postamble fixup in case of SW pipelining
Expand Down Expand Up @@ -1315,6 +1436,9 @@ def __init__(self, config):
class SlothySelfCheckException(Exception):
"""Exception thrown upon selfcheck failures"""

class SlothySelfTestException(Exception):
"""Exception thrown upon selftest failures"""

class SlothyBase(LockAttributes):
"""Stateless core of SLOTHY.
Expand Down Expand Up @@ -1875,6 +1999,8 @@ def _extract_result(self):
self._result.selfcheck_with_fixup(self.logger.getChild("selfcheck"))
self._result.offset_fixup(self.logger.getChild("fixup"))

self._result.selftest(self.logger.getChild("selftest"))

def _extract_positions(self, get_value):

if self.config.variable_size:
Expand Down Expand Up @@ -2952,7 +3078,7 @@ def _add_constraints_latencies(self):
if isinstance(latency, int):
self.logger.debug("General latency constraint: [%s] >= [%s] + %d",
t, i.src, latency)
# Some microarchitectures have instructions with 0-cycle latency, i.e., the can
# Some microarchitectures have instructions with 0-cycle latency, i.e., the can
# forward the result to an instruction in the same cycle (e.g., X+str on Cortex-M7)
# If that is the case we need to make sure that the consumer is after the producer
# in the output.
Expand Down
Loading

0 comments on commit c96bc14

Please sign in to comment.