Skip to content
This repository has been archived by the owner on Oct 1, 2020. It is now read-only.

Q8GEMM per-channel quant 32bit/16bit accumulation #54

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

mortzur
Copy link

@mortzur mortzur commented Mar 21, 2019

Micro-kernels implementation of the following:

  • Q8GEMM with per-channel weights quantization parameters (.c) + unit tests + benchmarks

  • Q8GEMM with per-channel weights quantization parameters for AARCH32 (.S) + unit tests + benchmarks

  • Q8GEMM with per-channel weights quantization parameters with 16bit opportunistic accumulation (.c) + unit tests + benchmarks

  • Q8GEMM with per-channel weights quantization parameters with 16bit opportunistic accumulation for AARCH32 (.S) + unit tests + benchmarks

@mortzur mortzur requested a review from hlu1 March 21, 2019 19:28
@franksun007
Copy link

I believe the following patches are missing from the CMakeList.txt

diff --git a/CMakeLists.txt b/CMakeLists.txt
index a5ddc49..6320b1e 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -188,11 +188,15 @@ SET(QNNPACK_AARCH32_ASM_UKERNELS
   src/q8conv/4x8-aarch32-neon.S
   src/q8dwconv/up8x9-aarch32-neon.S
   src/q8gemm/4x8-aarch32-neon.S
-  src/q8gemm/4x8c2-xzp-aarch32-neon.S)
+  src/q8gemm/4x8c2-xzp-aarch32-neon.S
+  src/q8gemm/4x8-aarch32-neon-per-channel.S
+  src/q8gemm/4x8-aarch32-neon-per-channel-16bitAcc.S)
 
 SET(QNNPACK_AARCH64_ASM_UKERNELS
   src/q8conv/8x8-aarch64-neon.S
-  src/q8gemm/8x8-aarch64-neon.S)
+  src/q8gemm/8x8-aarch64-neon.S
+  src/q8gemm/4x8-neon_per_channel.c
+  src/q8gemm/4x8-neon_per_channel_16bitAcc.c)
 
 SET(QNNPACK_X86_SSE2_UKERNELS
   src/q8avgpool/mp8x9p8q-sse2.c

@franksun007
Copy link

franksun007 commented Jun 10, 2019

Also, test and benchmarks failed to compile on ARM32 platform. This might be an easy fix with if guard.

Sorry, my bad. Only the benchmark is not compiling correctly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants