Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add neureka support #2

Merged
merged 72 commits into from
Feb 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
8cd7965
Add neureka support similar to ne16
lukamac Jan 14, 2024
e096415
add neureka support to test app
lukamac Jan 14, 2024
c480025
Add weight mem source flag
lukamac Jan 14, 2024
953db1e
Fix formatting
lukamac Jan 15, 2024
a114acd
Fix strides and counters, remove stride2x2 and flag16 mode
lukamac Jan 16, 2024
94bbe08
Fix uninitialized L1 data
lukamac Jan 16, 2024
6341d0e
Fixup nnx_quant_t nnx_norm_t
lukamac Jan 16, 2024
eddf389
Add Neureka weight roll/unroll script
lukamac Jan 17, 2024
e281f4d
Fix Ne16 weight rolling was unpacking to bits instead of 8
lukamac Jan 17, 2024
d6008b6
Fix generated arrays initialization
lukamac Jan 18, 2024
56afb5f
Fix Neureka weight unroll for 1x1 mode
lukamac Jan 18, 2024
925aa41
Fix Arpan's name in the contributors
lukamac Jan 18, 2024
0140665
Remove WIEGHT_D0_STRIDE_MODE_1x1
lukamac Jan 18, 2024
741b5d7
Add multi-accelerator support and neureka as a target
lukamac Jan 18, 2024
27664cd
Remove stride2x2 for neureka
lukamac Jan 18, 2024
ccc835b
Fix formatting
lukamac Jan 18, 2024
df0eb6b
Skip invalid tests
lukamac Jan 18, 2024
31dc36e
Change invalid tes tskip to explicit pytest skip with reason
lukamac Jan 18, 2024
7bae33c
Add neureka test to the CI
lukamac Jan 18, 2024
a0080c6
Fix formatting
lukamac Jan 18, 2024
be50522
Add --print-tensors flag
lukamac Jan 19, 2024
2c07dbd
Remove memcpys since it was a linker script bug
lukamac Jan 19, 2024
6a82ff5
Add Application section to test's readme
lukamac Jan 19, 2024
295ce90
Fix formatting
lukamac Jan 19, 2024
dc2409c
Fix accelerator name printing
lukamac Jan 19, 2024
d6ba620
Add input_signed to neureka
lukamac Jan 19, 2024
0934172
Add readme per accelerator
lukamac Jan 19, 2024
2947229
Remove ne16 input dim 2 stride calculation
lukamac Jan 19, 2024
b7442ee
Move common validators to NnxTest
lukamac Jan 19, 2024
91effde
Fix xored relu with unsigned instead of signed
lukamac Jan 22, 2024
0187793
Change gvsoc functions have a body only when running in gvsoc
lukamac Jan 22, 2024
4fab480
Change remove %s from main strings
lukamac Jan 22, 2024
3906e7a
Fix output checking should be done before cluster close since data is…
lukamac Jan 22, 2024
774cfd3
Remove L2 copies of tensors
lukamac Jan 22, 2024
4358ed5
Replace pointer arithmetic with array indexing
lukamac Jan 22, 2024
668c045
Fix formatting
lukamac Jan 22, 2024
cf70c78
Fix Neureka name in readme
lukamac Jan 22, 2024
b02e018
Align names with Neureka.py
lukamac Jan 24, 2024
5e61e39
Pad cin
lukamac Jan 24, 2024
8345b1d
Rename <acc>.py to <acc>MemoryLayout.py
lukamac Jan 25, 2024
a623ccf
Extract functional model from test gen
lukamac Jan 25, 2024
b743930
Remove conf from NeuralEngineFunctionalModel
lukamac Jan 25, 2024
f425ea5
Add isort
lukamac Jan 26, 2024
932847d
Remove neureka siracusa clock gating
lukamac Jan 26, 2024
7ff3fcb
Remove inline from hal
lukamac Jan 26, 2024
772fd95
Removed xor, python has xor
lukamac Jan 26, 2024
2223af9
WIP: Add without norm_quant
lukamac Jan 26, 2024
8232527
Remove -flto
lukamac Jan 26, 2024
df3f5dd
Add -std=c11
lukamac Jan 26, 2024
24ebd9e
Move stride shift to stride2x2 function
lukamac Jan 26, 2024
f1ed5f6
Fix flag clear before setting
lukamac Jan 26, 2024
c47b2c5
Fix normMode hardcoded to 32bit
lukamac Jan 26, 2024
27ab3a5
Set quantMode in *_set_bits function
lukamac Jan 26, 2024
23009cc
Fix output d0 stride and rename defs
lukamac Jan 27, 2024
e78dd80
Fixes to strides and stride2x2
lukamac Jan 27, 2024
a6f142a
Add no norm_quant to neureka and all the fixes too
lukamac Jan 27, 2024
c436ea4
Fix stride2x2 validity check for out channel to check stride evenness
lukamac Jan 27, 2024
eda8b51
Fix formatting
lukamac Jan 27, 2024
8b37485
Remove TODO's cause neureka clearly needs these functions
lukamac Jan 27, 2024
d9c45ef
Remove equal 0 check because that can never be the case due to the re…
lukamac Jan 27, 2024
29ee483
Remove TODOs, checked padding
lukamac Jan 27, 2024
d9c7723
Rename divnceil and remainder, and add nnx_ prefix
lukamac Jan 27, 2024
37ba86c
Add citation
lukamac Jan 29, 2024
9e0b211
Add sdk and compiler commit hashes
lukamac Jan 29, 2024
1a4f873
Change task size to a define
lukamac Jan 29, 2024
0412759
Update supported acc supported features
lukamac Jan 29, 2024
b9f3de4
Update changelog
lukamac Jan 29, 2024
5990d83
Update pulp-sdk commit hash
lukamac Jan 29, 2024
07f47d9
Remove -std=c11 flag
lukamac Jan 29, 2024
0791162
Fix readme collapsable verbatim
lukamac Jan 29, 2024
9145445
Remove __PLATFORM__ check from the library since it's pulp-sdk specific
lukamac Jan 29, 2024
6d24dd8
Change channel and bits with w_in_stride for set_ptrs
lukamac Jan 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 20 additions & 4 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,25 +20,41 @@ stages:
- lint
- test

format_python:
python_format:
stage: lint
tags:
- python-lint
script:
- black --check .

static_check_python:
python_sort_imports:
stage: lint
tags:
- python-lint
script:
- isort --check test

python_static_check:
stage: lint
tags:
- python-lint
script:
- pyright .

run_test0:
run_ne16_test:
stage: test
tags:
- gap9-sdk
artifacts:
untracked: true
script:
- cd test && pytest test.py --test-dir tests --recursive
- cd test && pytest test.py --test-dir tests --recursive -A ne16

run_neureka_test:
stage: test
tags:
- siracusa-sdk
artifacts:
untracked: true
script:
- cd test && pytest test.py --test-dir tests --recursive -A neureka
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,25 @@
# Changelog

## [Unreleased]

### Added

- N-EUREKA accelerator support: 3x3, 1x1, and 3x3 depthwise convolution kernels
- Support for kernels without normalization and quantization for NE16
- isort check
- publication citation

### Changed

- `ne16_task_init` got split into smaller parts: `ne16_task_init`, `ne16_task_set_op_to_conv`, `ne16_task_set_weight_offset`, `ne16_task_set_bits`, `ne16_task_set_norm_quant`
- strides in `ne16_task_set_strides`, `ne16_task_set_dims`, and `ne16_task_set_ptrs` are now strides between consecutive elements in that dimension
- `ne16_task_queue_size` is now `NE16_TASK_QUEUE_SIZE`

### Removed

- `k_in_stride`, `w_in_stride`, `k_out_stride`, and `w_out_stride` from `ne16_nnx_dispatch_stride2x2`
- `mode` attribute from `ne16_quant_t` structure

## [0.3.0] - 2024-01-14

### Added
Expand Down
80 changes: 37 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,51 +39,22 @@ _Note: The accelerator can provide additional helper functions if needed._

## Accelerators

### NE16

Github repo [link](https://github.com/pulp-platform/ne16).

#### Implemented features

- [x] Convolution w/ kernel shape 1x1
- [x] Convolution w/ kernel shape 3x3
- [x] Depthwise convolution w/ kernel shape 3x3
- [x] Stride 1x1
- [x] Stride 2x2
- [ ] Normalization and quantization
- [x] With
- [ ] Without
- [x] Relu (w/ and w/o)
- [x] Bias (w/ and w/o)
- [ ] Per-channel shift
- [x] Per-layer shift
- [ ] Rounding
- [ ] Input type
- [x] uint8
- [ ] uint16
- [ ] Output type
- [x] int8
- [x] uint8 (only w/ Relu)
- [ ] int32
- [ ] uint32 (only w/ Relu)
- [ ] Scale type
- [x] uint8
- [ ] uint16
- [ ] uint32
- [x] Bias type
- [x] int32
- [ ] Weight type
- [x] int8
- [ ] int2-7

### Neureka

**Untested and considered broken.**
- [NE16](ne16/README.md)
- [Neureka](neureka/README.md)

## Testing

You can find information about testing in the dedicated [README](test/README.md).

### Environment

The library was tested with following pairs of SDKs and compilers:

| SDK | SDK Commit Hash | Compiler | Compiler Commit Hash |
| --- | --------------- | -------- | -------------------- |
| gap\_sdk (obtainable from GreenWaves Technologies) | 90df4ce219 | [gap\_gnu\_toolchain](https://github.com/GreenWaves-Technologies/gap_gnu_toolchain) | 360fd4f9d6 |
| [pulp-sdk](https://github.com/Scheremo/pulp-sdk) | c216298881 | [pulp-riscv-gnu-toolchain](https://github.com/GreenWaves-Technologies/gap_gnu_toolchain) | 9938bd8fcf (release v1.0.16) |

## Contributing

Bug reports and feature requests should be reported through issues.
Expand All @@ -93,15 +64,38 @@ All the development should be done through forks and merged onto the `dev` branc

The library will follow the [Semantic Versioning](https://semver.org/).

## Citing
## Publication

<details>
<summary>If you use PULP-NNX in your work, you can cite us:</summary>

```
@inproceedings{10.1145/3607889.3609092,
author = {Macan, Luka and Burrello, Alessio and Benini, Luca and Conti, Francesco},
title = {WIP: Automatic DNN Deployment on Heterogeneous Platforms: the GAP9 Case Study},
year = {2024},
isbn = {9798400702907},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3607889.3609092},
doi = {10.1145/3607889.3609092},
abstract = {Emerging Artificial-Intelligence-enabled System-on-Chips (AI-SoCs) combine a flexible microcontroller with parallel Digital Signal Processors (DSP) and heterogeneous acceleration capabilities. In this Work-in-Progress paper, we focus on the GAP9 RISC-V SoC as a case study to show how the open-source DORY Deep Neural Network (DNN) tool flow can be extended for heterogeneous acceleration by fine grained interleaving of a dedicated Neural Engine and a cluster of RISC-V cores. Our results show that up to 91\% of the peak accelerator throughput can be extracted in end-to-end execution of benchmarks based on MobileNet-V1 and V2.},
booktitle = {Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems},
pages = {9–10},
numpages = {2},
keywords = {TinyML, MCUs, deep learning, HW accelerators},
location = {<conf-loc>, <city>Hamburg</city>, <country>Germany</country>, </conf-loc>},
series = {CASES '23 Companion}
}
```

*TBA*
</details>

## Contributors

* Luka Macan <[[email protected]](mailto:[email protected])>
* Francesco Conti <[[email protected]](mailto:[email protected])>
* Arpan Prasad <[[email protected]](mailto:[email protected])>
* Arpan Suravi Prasad <[[email protected]](mailto:[email protected])>

## License

Expand Down
15 changes: 7 additions & 8 deletions inc/pulp_nnx_ne16.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ void ne16_nnx_dispatch_wait(ne16_dev_t *dev);
/** ne16_nnx_dispatch
*
* Dispatch a task to the accelerator.
* Fails with return code 1 if the task cannot be dispatched. Otherwise returns 0.
* Fails with return code 1 if the task cannot be dispatched. Otherwise returns
* 0.
*/
int ne16_nnx_dispatch(ne16_dev_t *dev, ne16_task_t *task);

Expand All @@ -59,7 +60,6 @@ int ne16_nnx_resolve_check(ne16_dev_t *dev, ne16_task_t *task);
*/
void ne16_nnx_resolve_wait(ne16_dev_t *dev, ne16_task_t *task);


/* Additional helper functions */

/** ne16_nnx_dispatch_stride2x2
Expand All @@ -69,9 +69,8 @@ void ne16_nnx_resolve_wait(ne16_dev_t *dev, ne16_task_t *task);
* tile the tile to the subtile's spatial dimensions (in this case 3x3 output).
* Works only if the k_out is divisible by 2.
*/
void ne16_nnx_dispatch_stride2x2(
ne16_dev_t *dev, ne16_task_t *task, const uint32_t w_in, const uint32_t k_in,
const uint32_t w_in_stride, const uint32_t k_in_stride,
const uint32_t h_out, const uint32_t w_out, const uint32_t k_out,
const uint32_t w_out_stride, const uint32_t k_out_stride,
const uint8_t h_ker, const uint8_t w_ker);
void ne16_nnx_dispatch_stride2x2(ne16_dev_t *dev, ne16_task_t *task,
const uint32_t w_in, const uint32_t k_in,
const uint32_t h_out, const uint32_t w_out,
const uint32_t k_out, const uint8_t h_ker,
const uint8_t w_ker);
61 changes: 61 additions & 0 deletions inc/pulp_nnx_neureka.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
/*
* Luka Macan <[email protected]>
*
* Copyright 2023 ETH Zurich and University of Bologna
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* SPDX-License-Identifier: Apache-2.0
*/

#include "neureka.h"
#include "neureka_siracusa_bsp.h"
#include "neureka_task.h"
#include <stdint.h>

/* PULP-NNX interface */

void neureka_nnx_init(neureka_dev_t *dev, neureka_siracusa_conf_t *conf);
void neureka_nnx_term(neureka_dev_t *dev);

/** neureka_nnx_dispatch_check
*
* Check whether you can dispatch to the accelerator.
*/
int neureka_nnx_dispatch_check(neureka_dev_t *dev);

/** neureka_nnx_dispatch_wait
*
* Block until you can dispatch to the accelerator.
*/
void neureka_nnx_dispatch_wait(neureka_dev_t *dev);

/** neureka_nnx_dispatch
*
* Dispatch a task to the accelerator.
* Fails with return code 1 if the task cannot be dispatched. Otherwise returns
* 0.
*/
int neureka_nnx_dispatch(neureka_dev_t *dev, neureka_task_t *task);

/** neureka_nnx_resolve_check
*
* Check whether the task has been resolved.
*/
int neureka_nnx_resolve_check(neureka_dev_t *dev, neureka_task_t *task);

/** neureka_nnx_resolve_wait
*
* Block until you can resolve the task.
*/
void neureka_nnx_resolve_wait(neureka_dev_t *dev, neureka_task_t *task);
36 changes: 36 additions & 0 deletions ne16/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# NE16

## Docs

- Github repo [link](https://github.com/pulp-platform/ne16).

## Implemented features

- [x] Convolution w/ kernel shape 1x1
- [x] Convolution w/ kernel shape 3x3
- [x] Depthwise convolution w/ kernel shape 3x3
- [x] Stride 2x2
- [ ] Normalization and quantization
- [x] With
- [x] Without
- [x] Relu (w/ and w/o)
- [x] Bias (w/ and w/o)
- [ ] Per-channel shift
- [x] Per-layer shift
- [ ] Rounding
- [ ] Input type
- [x] uint8
- [ ] uint16
lukamac marked this conversation as resolved.
Show resolved Hide resolved
- [ ] Output type
- [x] int8
- [x] uint8 (only w/ Relu)
- [x] int32
- [ ] Scale type
- [x] uint8
- [ ] uint16
- [ ] uint32
- [x] Bias type
- [x] int32
- [ ] Weight type
- [x] int8
- [ ] int2-7
2 changes: 0 additions & 2 deletions ne16/hal/ne16.c
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,6 @@
#define NE16_STATUS_EMPTY (0x000)
#define NE16_STATUS_FULL (0x101)

inline int ne16_task_queue_size(ne16_dev_t *dev) { return 2; }

inline int ne16_task_queue_tasks_in_flight(ne16_dev_t *dev) {
uint32_t status = hwpe_task_queue_status(&dev->hwpe_dev);
return (status & 0x1) + ((status >> 8) & 0x1);
Expand Down
3 changes: 2 additions & 1 deletion ne16/hal/ne16.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,12 @@
#include "hwpe.h"
#include <stdint.h>

#define NE16_TASK_QUEUE_SIZE (2)

typedef struct ne16_dev_t {
hwpe_dev_t hwpe_dev; /* Implements the HWPE device interface */
} ne16_dev_t;

int ne16_task_queue_size(ne16_dev_t *dev);
int ne16_task_queue_tasks_in_flight(ne16_dev_t *dev);
int ne16_task_queue_empty(ne16_dev_t *dev);
int ne16_task_queue_full(ne16_dev_t *dev);
Expand Down
Loading
Loading