The goal of this project is to run Convolutional Neural Network layers for flash-resident-matrices. Now gemm using NEON for ARM CPU is implemented.
This project is implemented based on BLAS-on-flash and run using Arm Compute Library.
- BLAS-on-flash https://github.com/microsoft/BLAS-on-flash
- Arm Compute Library https://github.com/ARM-software/ComputeLibrary
- Ubuntu 16.04
- Arm Compute Library 19.02
- built with neon option turned on
Set CMakeFiles options as you want.
vim CMakeFiles
- PROGRAM_BUDGET Memory budget of the gemm with byte size
- GEMM_BLK_SIZE The number of rows and cols of submatrices
- N_IO_THR The number of IO threads
- N_COMPUTE_THR The number of compute threads
git clone
vim CMakeLists.txt
- modify
set (ACL_ROOT [arm_compute_library_path])
- modify
mkdir bin && cd bin
cmake ..
make
cd ..
gemm execution
cd misc
chmod +x gemm.sh
./exec.sh [A_row] [B_row] [B_col]
Example case with
- size of inputs and output matrices = 4096x4096
- GEMM_BLK_SIZE = 512
- and various memory budget
- run on Odroid-XU4 having Exynos5422 Inference time and maximum memory usage is shown on following graph.
More detailed explanation for method and results can be found in BLAS-on-flash paper and this paper.
CNN-on-flash is open-sourced software licensed under the MIT license.