All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Add
apb
dependency of version 0.2.4 - Add support for the
FENCE
instruction - Add support for DRAMsys5.0 co-simulation
- Add support for atomics in L2
- Add physical feasible TeraPool configuration with SubGroup hierarchy.
- Extended
tracevis.py
to support the new Perfetto UI and compress large traces - Use custom compiler for VCS specified with
CC
andCCX
environment variable - Implement operand gating for SIMD and MAC Units in Snitch IPU's DSP Unit
- Add Channel Estimation application and kernels
- Update Bender to version 0.28.1
- Update default Questasim version to 2022.3
- Decrease stack size to 128 words
- Add CFFT radix-4 and radix-2 kernels
- Parametrize the performance counters
- Restructure the software folder by moving kernels, data, and tests to dedicated folders
- Add Channel Estimation based on multiplication by (1 / pilot)
- Gate the LSU output to ensure a stable handshake interface and save energy
- Update to the latest GitHub CI actions
- Update
axi
to 0.39.2 - Update
tech_cells_generic
to 0.2.13 - Update
common_cells
to 1.33.0 - Update
common_verification
to 0.2.3 - Update
register_interface
to 0.4.3 - Updated Halide to version 15
- Move instruction cache into its own dependency
- Use automatically generated control registers
- Fix type issue in
snitch_addr_demux
- Properly disable the debugging CSRs in ASIC implementations
- Fix a bug in the DMA's distributed midend
- Fix bugs in radix2, radix4by2 parallelization and loading of data for radix4 CFFT
- Fix performance bug in Snitch decoder
- Add a DMA
- Add pv.pack.h xpulpv2 instruction
- Add a script to generate random data to preload the L2 memory
- Add stack overflow simulator warning using dedicated CSR
- Measure the
wfi
stalls and stalls caused byopc
properly - Fix the allocator initialization
- After building GCC, copy
riscv.ld
required for cMake to install folder - Disable GCC tree-loop-distribute-patterns optimizations causing stack overflows
- Disable problematic GCC
memset
andmemcpy
built-in functions
- Increase the default AXI width to 512 for MemPool and TeraPool
- Register all control signals at hierarchy boundaries
- Upgrade to LLVM 14
- Support multiple outstanding wake-up calls in Snitch
- Clean out tracing script and improve the traces' size and checks
- Replace NUM_CORES and similar macros with function calls in software
- Fix CI runner to Ubuntu 20.04 and Python to version 3.8
- Add Halide runtime and build scripts for applications
- Add Halide example applications (2D convolution & matrix multiplication)
- Add CI workflow for MemPool with 256 cores
- Add hierarchical AXI interconnect to the
mempool_group
- Integrate a
traffic_generator
into the tile - Add a trace visualization script
tracevis.py
- Add
config
flag to set specific MemPool flavor, eitherminpool
ormempool
- Add bypass channels through the groups for the northeast intergroup connection
- Add capability to quickly write a value via a CSR
- Support for simulation with VCS through the
simvcs
andsimcvcs
Makefile targets - Add Load Reserved and Store Conditional from "A" standard extension of RISC-V to the TCDM adapter
- Add the
terapool
configuration - Add read-only caches to the hierarchical AXI interconnect
- Add a
memcpy
benchmark - Add a systolic configuration including runtime support and a matmul application
- Add
axpy
kernel - Add Spyglass linting scripts
- Add an OpenMP runtime and example applications
- Add dotp kernels
- Avoid the elaboration of SVA assertions on the
reorder_buffer
module - Fix the elaboration of constant signal with an initial value in the
mempool_system
module - Specify Halide's library path while installing
- Fix the waves scripts to match the new hierarchy names
- Increase pending queue in icache
- Make serial lookup in icache stallable
- Generalize MemPool to have any number of groups, configured through the
num_groups
parameter - Kernel
conv_2d
will not preload unused values anymore
- Compile verilator and the verilated model with Clang, for a faster compilation time
- Update BibTeX reference to the MemPool DATE paper
- Rewrite the
traffic_generator
with DPI calls - Replace group's butterflies with logarithmic interconnects
- Do not strip the binaries of debug symbols
- Remove tile's north/east TCDM connection shuffling from the groups
- Remove the reset synchronizer from the
mempool_cluster
- Changed LSU from in-order memory responses to out-of-order memory responses
- Remove the
reorder_buffer
from thetcdm_shim
- Register wake-up signals and use
wfi
for barriers - Bump the dependencies to the latest version (
common_cells
,register_interface
,axi
,tech_cells_generic
) - Use the latest version of Modelsim by default
- Consistently print Verilator's simulation time in decimal
- Add a timeout to CI stages that could run indefinitely on errors
- Deprecate
patch-hw
and replace it with theupdate-deps
Makefile target, which updates and patches the dependencies. - Bump bender to
v0.23.2
- Bump verilator to
v4.218
- Make the L2 memory mutli-banked
- Improve parsing speed of tracevis by caching the
addr2line
calls - Replace
/toolchain/riscv-opcodes
by submodule - Change
make update_opcodes
to fit with new submodule structure ofriscv-opcodes
- Update CI to work with new submodule structure of
riscv-opcodes
- Disable
rvv
extension forriscv-isa-sim
- Issue write responses to Snitch for the TCDM and AXI interconnect
- Bump axi to
v0.36.0
- Run simulations at 500MHz by default
- Capability to enable and disable the traces with a CSR
- CPU model for MemPool in GCC to enable correct instruction scheduling
- Added GitHub CI flow
- Allow atomic extension to be enabled in GCC
- Replace atomic library with the corresponding builtins
- Compile all applications in the CI instead of only the ones executed
- Move Halide applications to their own directory
- Move linting scripts to
scripts
folder - Move flat hardware dependencies to submodules
- Updated LLVM to version 12
- Updated Halide to version 12
- Run unit tests with Verilator
- Rename
mempool
mempool_cluster
- Restructure software folder
- Stall cycle counting in the trace no longer misses stalls
- Remove unwanted latches in instruction cache
- Boot ROM address offset depends on the data width of the ROM
- Toolchain and hardware support for Xpulp instructions:
- Post-incrementing and register-register loads and stores (
pv.lb[u]
,pv.lh[u]
,pv.lw
) - 32-bit multiply-accumulate instructions (
pv.mac
,pv.msu
) - Arithmetic SIMD instructions (
pv.{add, sub, abs, avg, avgu, min, minu, max, maxu, srl, sra, sll, or, xor, and, dotsp, dotup, dotusp, sdotsp, sdotup, sdotusp}.{h, b}
- Sub-word manipulation SIMD instructions (
pv.{extract, extractu, insert, shuffle2}.{h, b}
)
- Post-incrementing and register-register loads and stores (
- Disable the branch prediction if there are multiple early-hits
- Align end of
.text
section with the instruction cache - Observe the code style guidelines in the matrix multiplication and convolution kernels
- Clean-up the pedantic compilation warnings of the matrix multiplication and convolution kernels
- Assertion checking that Snitch's instruction interface is stable during stalls
- Update
axi
dependency to 0.27.1 - Change I$ policy to avoid evicting the cache-line currently in use
- Make the L0 cache's data latch-based and double its size
- Make the L1 cache's tag latch-based
- Serialize the L1 lookup
- Add a workaround for a Modelsim 2019 bug in the
axi_demux
- Keep clang-format from reformatting the
apps/common/riscv_test.h
assembly header file
- Initial release.