Skip to content

Benchmarks 2024 02 11 TVM LLVM

Philipp van Kempen edited this page Feb 11, 2024 · 8 revisions

Setup

Simulator

Toolchains

  • LLVM/Clang:
    • TODO: Version
    • Linker: lld (TODO)
    • RISC-V GCC for Headers, libc,...

Models

Package Versions

  • MLonMCU : main

  • TVM : Nightly Pre-Build

  • Spike : 0bc176b3fca43560b9e8586cdbc41cfde073e17a

  • Spike PK : 7e9b671c0415dfd7b562ac934feb9380075d4aa2

Miscellaneous

  • Used -Os flag for compilation.
  • Benchmarks generated using MLonMCU deployment tool with minimal efforts.
  • Memory metrics are reported in Bytes

Results (Framework: tvm, Backend: tvmaot, Toolchain: llvm)

Audio Wake Words (aww)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Layout Kernels Mode Arch Auto-Vectorization
33508602
( 0.5x )
109102
( 1.205 )
59508
( 3.097 )
0 NCHW TVM Fallback RV32GC -
27511073
( 0.6x )
102506
( 1.133 )
59508
( 3.097 )
0 NHWC TVM Fallback RV32GC -
13706216
( 1.1x )
102504
( 1.133 )
51336
( 2.672 )
0 NCHW TVM Autotuned RV32GC -
27505623
( 0.6x )
102618
( 1.134 )
59508
( 3.097 )
0 NHWC TVM Autotuned RV32GC -
3384606
( 4.6x )
105404
( 1.165 )
59508
( 3.097 )
128 NCHW TVM Fallback RV32GCV Loop+SLP
3384606
( 4.6x )
105404
( 1.165 )
59508
( 3.097 )
1024 NCHW TVM Fallback RV32GCV Loop+SLP
9566669
( 1.6x )
103606
( 1.145 )
59508
( 3.097 )
128 NHWC TVM Fallback RV32GCV Loop+SLP
6682669
( 2.3x )
103606
( 1.145 )
59508
( 3.097 )
1024 NHWC TVM Fallback RV32GCV Loop+SLP
5607715
( 2.8x )
106776
( 1.18 )
51336
( 2.672 )
128 NCHW TVM Autotuned RV32GCV Loop+SLP
3881984
( 4.0x )
106776
( 1.18 )
51336
( 2.672 )
1024 NCHW TVM Autotuned RV32GCV Loop+SLP
9565134
( 1.6x )
104544
( 1.155 )
59508
( 3.097 )
128 NHWC TVM Autotuned RV32GCV Loop+SLP
6683403
( 2.3x )
104544
( 1.155 )
59508
( 3.097 )
1024 NHWC TVM Autotuned RV32GCV Loop+SLP
15615223
( Base )
90510
( Base )
19212
( Base )
0 NHWC muRISCV-NN Scalar RV32GC -
6838468
( 2.3x )
93734
( 1.036 )
19212
( 1.0 )
128 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
5367941
( 2.9x )
93734
( 1.036 )
19212
( 1.0 )
1024 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
7407276
( 2.1x )
90216
( 0.997 )
23676
( 1.232 )
128 NHWC muRISCV-NN Vector RV32GCV -
3759264
( 4.2x )
90216
( 0.997 )
23676
( 1.232 )
1024 NHWC muRISCV-NN Vector RV32GCV -

Notes

  • TVM Fallback kernels can be vectoried easily with LLVM
  • Autotuning + NCHW layout + AutoVectorizer may outperform muRISCV-NN (especially for small VLENs)
  • Tuned results could even be more drastically (Tuned on old TVM version and without auto-vectorizer! TODO: replace)
  • Autotuned is sometimes worse than Fallback!

Image Classification (resnet)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Layout Kernels Mode Arch Auto-Vectorization
144789099
( 0.4x )
218266
( 1.582 )
108420
( 1.953 )
0 NCHW TVM Fallback RV32GC -
112394400
( 0.5x )
209174
( 1.516 )
108420
( 1.953 )
0 NHWC TVM Fallback RV32GC -
53701817
( 1.1x )
212582
( 1.541 )
92236
( 1.661 )
0 NCHW TVM Autotuned RV32GC -
112389724
( 0.5x )
209256
( 1.517 )
108420
( 1.953 )
0 NHWC TVM Autotuned RV32GC -
12825990
( 4.6x )
213676
( 1.549 )
108420
( 1.953 )
128 NCHW TVM Fallback RV32GCV Loop+SLP
12825991
( 4.6x )
213678
( 1.549 )
108420
( 1.953 )
1024 NCHW TVM Fallback RV32GCV Loop+SLP
36071697
( 1.6x )
210202
( 1.524 )
108420
( 1.953 )
128 NHWC TVM Fallback RV32GCV Loop+SLP
24311825
( 2.4x )
210202
( 1.524 )
108420
( 1.953 )
1024 NHWC TVM Fallback RV32GCV Loop+SLP
19686220
( 3.0x )
226232
( 1.64 )
92236
( 1.661 )
128 NCHW TVM Autotuned RV32GCV Loop+SLP
13965591
( 4.2x )
226232
( 1.64 )
92236
( 1.661 )
1024 NCHW TVM Autotuned RV32GCV Loop+SLP
36069954
( 1.6x )
210930
( 1.529 )
108420
( 1.953 )
128 NHWC TVM Autotuned RV32GCV Loop+SLP
24309153
( 2.4x )
210938
( 1.529 )
108420
( 1.953 )
1024 NHWC TVM Autotuned RV32GCV Loop+SLP
58402822
( Base )
137958
( Base )
55516
( Base )
0 NHWC muRISCV-NN Scalar RV32GC -
28255398
( 2.1x )
141694
( 1.027 )
55516
( 1.0 )
128 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
13704877
( 4.3x )
141694
( 1.027 )
55516
( 1.0 )
1024 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
27976928
( 2.1x )
138304
( 1.003 )
55516
( 1.0 )
128 NHWC muRISCV-NN Vector RV32GCV -
8035106
( 7.3x )
138304
( 1.003 )
55516
( 1.0 )
1024 NHWC muRISCV-NN Vector RV32GCV -

Anomaly Detection (toycar)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Layout Kernels Mode Arch Auto-Vectorization
3404880
( 0.6x )
581362
( 1.841 )
5572
( 1.168 )
0 NCHW TVM Fallback RV32GC -
3404880
( 0.6x )
581362
( 1.841 )
5572
( 1.168 )
0 NHWC TVM Fallback RV32GC -
2245737
( 0.8x )
609080
( 1.929 )
6884
( 1.443 )
0 NCHW TVM Autotuned RV32GC -
2245737
( 0.8x )
609080
( 1.929 )
6884
( 1.443 )
0 NHWC TVM Autotuned RV32GC -
984693
( 1.9x )
581098
( 1.84 )
5572
( 1.168 )
128 NCHW TVM Fallback RV32GCV Loop+SLP
984695
( 1.9x )
581106
( 1.84 )
5572
( 1.168 )
1024 NCHW TVM Fallback RV32GCV Loop+SLP
984693
( 1.9x )
581098
( 1.84 )
5572
( 1.168 )
128 NHWC TVM Fallback RV32GCV Loop+SLP
984693
( 1.9x )
581098
( 1.84 )
5572
( 1.168 )
1024 NHWC TVM Fallback RV32GCV Loop+SLP
1280619
( 1.5x )
600432
( 1.902 )
6884
( 1.443 )
128 NCHW TVM Autotuned RV32GCV Loop+SLP
1148032
( 1.6x )
600432
( 1.902 )
6884
( 1.443 )
1024 NCHW TVM Autotuned RV32GCV Loop+SLP
1280619
( 1.5x )
600432
( 1.902 )
6884
( 1.443 )
128 NHWC TVM Autotuned RV32GCV Loop+SLP
1148032
( 1.6x )
600432
( 1.902 )
6884
( 1.443 )
1024 NHWC TVM Autotuned RV32GCV Loop+SLP
1893647
( Base )
315740
( Base )
4772
( Base )
0 NHWC muRISCV-NN Scalar RV32GC -
662593
( 2.9x )
316396
( 1.002 )
4772
( 1.0 )
128 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
430222
( 4.4x )
316394
( 1.002 )
4772
( 1.0 )
1024 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
639280
( 3.0x )
315740
( 1.0 )
4772
( 1.0 )
128 NHWC muRISCV-NN Vector RV32GCV -
465832
( 4.1x )
315740
( 1.0 )
4772
( 1.0 )
1024 NHWC muRISCV-NN Vector RV32GCV -

Notes

  • For DNNs muRISCV-NN (Vector mode) can often outperform TVM (Fallback/Tuned) + AutoVectorizer

Visual Wake Words (vww)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Layout Kernels Mode Arch Auto-Vectorization
96665970
( 0.5x )
545172
( 1.685 )
181032
( 2.113 )
0 NCHW TVM Fallback RV32GC -
79940191
( 0.6x )
521128
( 1.611 )
181032
( 2.113 )
0 NHWC TVM Fallback RV32GC -
42404608
( 1.1x )
525208
( 1.623 )
181032
( 2.113 )
0 NCHW TVM Autotuned RV32GC -
79940191
( 0.6x )
521130
( 1.611 )
181032
( 2.113 )
0 NHWC TVM Autotuned RV32GC -
11010120
( 4.2x )
532510
( 1.646 )
181032
( 2.113 )
128 NCHW TVM Fallback RV32GCV Loop+SLP
11010120
( 4.2x )
532510
( 1.646 )
181032
( 2.113 )
1024 NCHW TVM Fallback RV32GCV Loop+SLP
30451803
( 1.5x )
523638
( 1.618 )
181032
( 2.113 )
128 NHWC TVM Fallback RV32GCV Loop+SLP
22700929
( 2.1x )
523636
( 1.618 )
181032
( 2.113 )
1024 NHWC TVM Fallback RV32GCV Loop+SLP
24965516
( 1.9x )
550736
( 1.702 )
181032
( 2.113 )
128 NCHW TVM Autotuned RV32GCV Loop+SLP
19204882
( 2.4x )
550718
( 1.702 )
181032
( 2.113 )
1024 NCHW TVM Autotuned RV32GCV Loop+SLP
30450316
( 1.5x )
523746
( 1.619 )
181032
( 2.113 )
128 NHWC TVM Autotuned RV32GCV Loop+SLP
22698968
( 2.1x )
523746
( 1.619 )
181032
( 2.113 )
1024 NHWC TVM Autotuned RV32GCV Loop+SLP
46765906
( Base )
323534
( Base )
85664
( Base )
0 NHWC muRISCV-NN Scalar RV32GC -
19684202
( 2.4x )
327720
( 1.013 )
85664
( 1.0 )
128 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
14888471
( 3.1x )
327722
( 1.013 )
85664
( 1.0 )
1024 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
21141100
( 2.2x )
324316
( 1.002 )
85664
( 1.0 )
128 NHWC muRISCV-NN Vector RV32GCV -
10453920
( 4.5x )
324314
( 1.002 )
85664
( 1.0 )
1024 NHWC muRISCV-NN Vector RV32GCV -

Original data

Click here to download the raw files for this benchmark.

2024-11-26
2024-11-21
2024-11-19
2024-11-18
2024-07-12
2024-06-29
2024-03-02
2024-02-26
2024-02-23
2024-02-22
2024-02-20
2024-02-11
2023-12-22
Clone this wiki locally