umd: open source compiler code

Release DLA1.2.0 compiler source code Signed-off-by: Prashant Gaikwad <[email protected]> Signed-off-by: Mitch Harwell <[email protected]> Signed-off-by: Gunjan Mehta <[email protected]> Signed-off-by: Ken Adams <[email protected]> Signed-off-by: Arvind M <[email protected]>
nvdla · Aug 28, 2019 · 1ae4738 · 1ae4738
1 parent 38a6300
commit 1ae4738
Show file tree

Hide file tree

Showing 211 changed files with 369,753 additions and 77 deletions.
diff --git a/CompilerFeatures.md b/CompilerFeatures.md
@@ -32,6 +32,13 @@
 ||EltWise MAX|&#10004;|Not implemented in SW|
 |**LRN**||&#10004;|Not implemented in SW|
 
+### Frameworks support
+
+|Framework &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|Status &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|
+|---------|-------|
+|Caffe|&#10004;|
+|ONNX|Future|
+
 ### Networks verification report
 
 |Network &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|Configuration &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|fp16 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |int8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |

diff --git a/LowPrecision.md b/LowPrecision.md
@@ -1,6 +1,6 @@
 # Low precision support in NVDLA
 
-Use of low precision such 8-bit, 4-bit, or even lower number of bits for inference is one of the optimization methods used in deep learning. NVDLA architecture includes INT8 (8-bit) precision support. It helps to compress the model reducing memory footprint and to improve performance with a small degradation in accuracy. Using INT8 precision for inference requires quantizing pre-trained models from floating point to INT8 and programming converters in NVDLA for scaling/re-scaling tensors.
+Use of low precision such 8-bit, 4-bit, or even lower number of bits for inference is one of the optimization methods used in deep learning. It helps to compress the model reducing memory footprint and to improve performance with a small degradation in accuracy. Using INT8 precision for inference requires quantizing pre-trained models from floating point to INT8 and programming converters in NVDLA for scaling/re-scaling tensors.
 
 ### NVDLA architecture for INT8 precision support includes the following:
 -	INT8 input/output data read/write
@@ -9,14 +9,24 @@ Use of low precision such 8-bit, 4-bit, or even lower number of bits for inferen
 -	Per-tensor and per-kernel output re-scaling using output converters
 
 ### Steps to generate INT8 quantized model:
--	Analyze the dynamic range of per-layer tensors and calculate scale factors
+-	Analyze the dynamic range of per-layer tensors and calculate scale factors using TensorRT
+-	Import scale factors generated using TensorRT to NVDLA JSON format
 -	Quantize model weights and determine the converter parameters using scale factors
 
-#### Analyze dynamic range of per-layer tensors and calculate scale factors
-A calibration tool can collect the dynamic range of the output tensor for each layer over a dataset of images. This dynamic range information can be used to calculate per-tensor scale factors. The NVDLA Compiler uses the following JSON schema to import scale factors.
+#### Analyze dynamic range of per-layer tensors and calculate scale factors using TensorRT
+A calibration tool collects the dynamic range of the output tensor for each layer over a dataset of images. This dynamic range information can be used to calculate per-tensor scale factors. For NVDLA, calibration interface TensorRT is used to generate scale factors.
+
+Refer to https://github.com/NVIDIA/TensorRT/tree/release/5.1/samples/opensource/sampleINT8 for sample application which explains how to use TensorRT to generate scales factors.
+
+Notes:
+-	Use IInt8EntropyCalibrator2 for calibration.
+-	Dump calibration scales using writeCalibrationCache() to import it in NVDLA JSON format.
+-	Do not set --useDLACore for calibration, it is used to generate runtime engine through TensorRT for NVIDIA Xavier platform such NVIDIA Jetson AGX Xavier which has NVDLA integrated.
 
 ##### JSON schema for calibration table
 
+The NVDLA Compiler uses the following JSON schema to import scale factors generated from TensorRT.
+
 ```
 {
     "type" : "object",
@@ -45,37 +55,18 @@ A calibration tool can collect the dynamic range of the output tensor for each l
 }
 ```
 
-##### Sample calibration table for first few layers of ResNet-50 using symmetric scaling
+##### How to covert calibration cache dump to NVDLA JSON format?
 
-```
-{
-	"data" : {
-		"scale": 0.00781453,
-		"min": 0,
-		"max": 0,
-		"offset": 0
-	},
-	"conv1" : {
-		"scale": 0.0891214,
-		"min": 0,
-		"max": 0,
-		"offset": 0
-	},
-	"pool1" : {
-		"scale": 0.0891214,
-		"min": 0,
-		"max": 0,
-		"offset": 0
-	},
-	"res2a_branch1" : {
-		"scale": 0.119546,
-		"min": 0,
-		"max": 0,
-		"offset": 0
-	}
-}
-```
+[calib_txt_to_json.py](https://github.com/nvdla/sw/tree/master/umd/utils/calibdata/calib_txt_to_json.py) can be used to convert calibration scales generated from TensorRT to NVDLA JSON format.
 
 #### Quantize model weights and determine the converter parameters
 
-The NVDLA Compiler has the ability to quantize model weights and determine the converter parameters using the scale factors from the calibration table.
+The NVDLA Compiler has the ability to quantize model weights and determine the converter parameters using the scale factors from the calibration table.
+
+Use --calibtable argument to use calibration table generated from TensorRT as input to NVDLA compiler.
+
+#### Example
+
+Sample calibration table for [ResNet-50 Caffe model](https://github.com/KaimingHe/deep-residual-networks) is shared at [calib.json](https://github.com/nvdla/sw/tree/master/umd/utils/calibdata/calib.json)
+
+This calibration table can be used with NVDLA compiler and [ResNet-50 Caffe model](https://github.com/KaimingHe/deep-residual-networks) to run ResNet-50 on NVDLA INT8 configuration
diff --git a/Roadmap.md b/Roadmap.md
@@ -0,0 +1,25 @@
+# NVDLA Roadmap
+
+### DLA 1.3.0
+
+- HW Multibatch for FC layers
+- Multi-input network support
+- Support different precision and format for input tensors
+- Buffer pre-registration
+- INT8 deconvolution
+- Deconvolution optmization
+- Support deconvolution with stride > 32
+- INT8 group convolution
+- Depthwise convolution optmization
+- ReLU-N
+- Machine Translation Layer (MTL)
+
+Note: APIs are expected to change in DLA1.3.0
+
+### Future
+
+- Memory optimzations
+- ONNX
+- Sample application for accuracy
+- Sample application for object detection
+
diff --git a/umd/Makefile b/umd/Makefile
@@ -1,4 +1,4 @@
-# Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2017-2019, NVIDIA CORPORATION. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -25,11 +25,19 @@
 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
 
-SUBDIRS = core/runtime \
-	  tests/runtime
+COMPILER_SUBDIRS = core/src/compiler \
+	apps/compiler
 
-subdirs:
-	for dir in $(SUBDIRS); do \
+RUNTIME_SUBDIRS = core/src/runtime \
+	apps/runtime
+
+compiler:
+	for dir in $(COMPILER_SUBDIRS); do \
+		$(MAKE) -C $$dir; \
+	done
+
+runtime:
+	for dir in $(RUNTIME_SUBDIRS); do \
 		$(MAKE) -C $$dir; \
 	done
 

diff --git a/umd/apps/compiler/CompileTest.cpp b/umd/apps/compiler/CompileTest.cpp
@@ -0,0 +1,121 @@
+/*
+ * Copyright (c) 2017-2019, NVIDIA CORPORATION. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *  * Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  * Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *  * Neither the name of NVIDIA CORPORATION nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "main.h"
+
+#include "nvdla/IProfile.h"
+#include "nvdla/IProfiler.h"
+#include "nvdla/IWisdom.h"
+#include "nvdla/INetwork.h"
+#include "nvdla/ICompiler.h"
+#include "nvdla/ITargetConfig.h"
+
+#include "ErrorMacros.h"
+#include "nvdla_os_inf.h"
+
+NvDlaError compileProfile(const TestAppArgs* appArgs, TestInfo* i)
+{
+    NvDlaError e = NvDlaSuccess;
+    std::string profileName = "";
+    std::string targetConfigName = "";
+
+    NvDlaFileHandle file = 0;
+    std::string fileName = "";
+    NvU8 *buffer = 0;
+    NvU64 size = 0;
+
+    nvdla::ICompiler* compiler = i->wisdom->getCompiler();
+    if (!compiler)
+        ORIGINATE_ERROR_FAIL(NvDlaError_BadParameter, "wisdom->getCompiler() failed");
+
+    if (!(appArgs->configtarget != ""))
+        ORIGINATE_ERROR_FAIL(NvDlaError_NotInitialized, "No target config found to load");
+
+    targetConfigName = appArgs->configtarget;
+
+    // Determine profile
+    PROPAGATE_ERROR_FAIL(generateProfile(appArgs, &profileName, i));
+
+    // Compile
+    NvDlaDebugPrintf("compiling profile \"%s\"... config \"%s\"...\n", profileName.c_str(), targetConfigName.c_str());
+    PROPAGATE_ERROR_FAIL(compiler->compile(profileName.c_str(), targetConfigName.c_str(), &i->compiledLoadable));
+
+    // Get loadable buffer and dump it into a file
+    PROPAGATE_ERROR_FAIL(compiler->getLoadableImageSize(profileName.c_str(),
+                                                    &size));
+    if (size == 0) {
+        ORIGINATE_ERROR_FAIL(NvDlaError_BadParameter,
+                            "Invalid size for a loadable");
+    }
+
+    buffer = (NvU8 *) NvDlaAlloc(size);
+    if (buffer == NULL) {
+        ORIGINATE_ERROR_FAIL(NvDlaError_InsufficientMemory,
+                            "Failed to allocate buffer for loadable");
+    }
+    PROPAGATE_ERROR_FAIL(compiler->getLoadableImage(profileName.c_str(),
+                                                    buffer));
+    fileName = profileName + ".nvdla";
+    PROPAGATE_ERROR_FAIL(NvDlaFopen(fileName.c_str(), NVDLA_OPEN_WRITE, &file));
+    PROPAGATE_ERROR_FAIL(NvDlaFwrite(file, buffer, size));
+
+fail:
+    NvDlaFclose(file);
+    if (buffer != NULL)
+        NvDlaFree(buffer);
+    return e;
+}
+
+NvDlaError compile(const TestAppArgs* appArgs, TestInfo* i)
+{
+    NvDlaError e = NvDlaSuccess;
+
+    i->compiledLoadable = 0;
+
+    NvDlaDebugPrintf("creating new wisdom context...\n");
+    i->wisdom = nvdla::createWisdom();
+    if (!i->wisdom)
+        ORIGINATE_ERROR_FAIL(NvDlaError_BadParameter, "createWisdom() failed");
+
+    NvDlaDebugPrintf("opening wisdom context...\n");
+    if (!i->wisdom->open(i->wisdomPath))
+        ORIGINATE_ERROR_FAIL(NvDlaError_BadParameter, "wisdom->open() failed to open: \"%s\"", i->wisdomPath.c_str());
+
+    // Compile
+    PROPAGATE_ERROR_FAIL(compileProfile(appArgs, i));
+
+    NvDlaDebugPrintf("closing wisdom context...\n");
+    i->wisdom->close();
+
+fail:
+    if (i->wisdom != NULL) {
+        nvdla::destroyWisdom(i->wisdom);
+        i->wisdom = NULL;
+    }
+    return e;
+}