diff --git a/.doctrees/environment.pickle b/.doctrees/environment.pickle
index cae0ed9a..ba21efa1 100644
Binary files a/.doctrees/environment.pickle and b/.doctrees/environment.pickle differ
diff --git a/.doctrees/hardware.doctree b/.doctrees/hardware.doctree
index f840bba5..9e956880 100644
Binary files a/.doctrees/hardware.doctree and b/.doctrees/hardware.doctree differ
diff --git a/_sources/hardware.rst.txt b/_sources/hardware.rst.txt
index 28c10c9d..7911c4ba 100644
--- a/_sources/hardware.rst.txt
+++ b/_sources/hardware.rst.txt
@@ -1,56 +1,59 @@
 Hardware Architecture
 =====================
 
-In this section we introduce the general concept of how HW accelerators are modelled within ZigZag and the different well-known accelerators we provide as examples. We start from the smallest building block defined in ZigZag and work our way up towards an accelerator.
+In this section, we introduce the general concept of how HW accelerators are modeled within ZigZag and the different well-known accelerators we provide as examples. We start from the smallest building block defined in ZigZag and work our way up towards an accelerator.
 
 Operational Unit
 ----------------
 
-Accelerating inference of a NN requires execution of multiplications and summations (accumulations) across multiple intermediate data (activations) using trained parameters (weights). The operational unit, typically a Multiplier, executes the multiplication of two data elements, typically an activation and a weight. 
+Accelerating inference of a NN requires the execution of multiplications and summations (accumulations) across multiple intermediate data (activations) using trained parameters (weights). The operational unit, typically a Multiplier, executes the multiplication of two data elements, typically an activation and a weight. 
 
 .. image:: images/hardware-architecture/operational-unit.jpg
   :width: 400
 
-The operational unit object has following attributes:
+The operational unit object has the following attributes:
 
-* **input_precision**: List of input operand (data) precision in number of bits for each input operand (typically 2 for Multiplier).
+* **input_precision**: List of input operand (data) precision in the number of bits for each input operand (typically there are two input operands for a Multiplier).
 * **output_precision**: The bit precision of the operation's output.
-* **energy_cost**: Energy of executing a single multiplication.
-* **area**: The HW area overhead of a single multiplier.
+* **energy_cost**: Energy of executing a single operation (e.g., a multiplication).
+* **area**: The HW area overhead of a single operational unit (e.g., a multiplier).
 
 Operational Array
 -----------------
 
-Inferencing a NN typically requires millions of operations, and an accelerator typically includes an array of operational units that can execute these operations. This can speed significantly up the computations, as well as increase energy efficiency which is covered later.
+Inferencing a NN typically requires millions of operations, and an accelerator typically includes an array of operational units that can execute these operations in parallel. This can significantly speed up the computations, as well as increase energy efficiency which is covered later.
 
-The array has multiple dimensions, each with a size. The importance of these dimensions is explained in the introduction of the memory hierarchy.
+The array can have one or multiple dimensions, each with a size. The importance of these dimensions is explained in the introduction of the memory hierarchy.
 
 .. image:: images/hardware-architecture/operational-array.jpg
   :width: 400
 
-The operational array object has:
+The operational array object has the following attributes:
 
 * **operational_unit**: The operational unit from which the array is built.
-* **dimensions**: The dimensions of the array. This should be defined as a dict, with the keys being the identifier of each dimension of the array (typically 'D1', 'D2, ...) and the values being the size of this dimension (i.e. the size of the array along that dimension).
+* **dimensions**: The dimensions of the array. This should be defined as a dict, with the keys being the identifier of each dimension of the array (typically 'D1', 'D2', ...) and the values being the size of this dimension (i.e. the size of the array along that dimension).
 
 
 Memory Instance
 ---------------
 
-In order to store the different activations and weights used for the computations in the operational array, different memory instances are attached in a hierarchical fashion. The instances define how big each memory is in terms of capacity and area overhead, what the cost of writing and reading from these memories is, what it's bandwidth is, and how many read/write/read-write ports it includes.
+In order to store the different activations and weights used for the computations in the operational array, different memory instances are attached in a hierarchical fashion. The instances define how big each memory is in terms of capacity and area, what the cost of writing and reading from these memories is, what its bandwidth is, and how many read/write/read-write ports it includes.
 
 .. image:: images/hardware-architecture/memory-instance.jpg
   :width: 400
 
-The memory instance object has:
+The memory instance object has the following attributes:
 
 * **name**: A name for the instance
 * **size**: The memory size in bits.
-* **r_bw/w_bw**: A read and write bandwidth in number of bits per cycle.
-* **r_cost/w_cost**: A read and write energy cost.
+* **r_bw/w_bw**: A read or write bandwidth in the number of bits per cycle.
+* **r_cost/w_cost**: A read or write energy cost.
 * **area**: Area overhead of the instance.
 * **r_port/w_port/rw_port**: The number of read/write/read-write ports the instance has available.
-* **latency**: The latency of an access in number of cycles.
+* **latency**: The latency of memory access in the number of cycles, i.e., after requiring read/write a memory address, how many cycles the memory takes to provide/receive this corresponding data. (For now, this attribute is not actively used. We assume that it is 1 to model the data prefetching behavior thanks to the deterministic dataflow.)
+
+(optional)
+* **min_r_granularity/min_w_granularity**: The minimal memory read/write granularity (in bit) the memory supports. This attribute is used to better model the memory that supports half-word access or a quarter-word access patterns. For example, if a memory's read bandwidth (wordlength) is 256 bit/cycle, its read energy (r_cost) is 100, and its min_r_granularity is 128 bits (i.e., assume this memory allow half-word read), read 128 bits from it will only take 50 energy. While if min_r_granularity is not defined or is defined as 256 bits, read 128 bits from it will take 100 energy.
 
 Memory Hierarchy
 ----------------
@@ -97,7 +100,7 @@ The core object includes:
 HW Accelerator Model
 --------------------
 
-Multiple cores are combined together into the HW Accelerator, which is the main object modelling the HW behaviour.
+Multiple cores are combined together into the HW Accelerator, which is the main object modeling the HW behavior.
 
 The accelerator object includes:
 
diff --git a/hardware.html b/hardware.html
index 6a742df8..973b1705 100644
--- a/hardware.html
+++ b/hardware.html
@@ -1 +1 @@
-<!DOCTYPE html> <html lang=en > <meta charset=utf-8  /> <meta name=viewport  content="width=device-width, initial-scale=1.0" /><meta name=viewport  content="width=device-width, initial-scale=1" /> <meta name=viewport  content="width=device-width,initial-scale=1"> <meta http-equiv=x-ua-compatible  content="ie=edge"> <meta name="lang:clipboard.copy" content="Copy to clipboard"> <meta name="lang:clipboard.copied" content="Copied to clipboard"> <meta name="lang:search.language" content=en > <meta name="lang:search.pipeline.stopwords" content=True > <meta name="lang:search.pipeline.trimmer" content=True > <meta name="lang:search.result.none" content="No matching documents"> <meta name="lang:search.result.one" content="1 matching document"> <meta name="lang:search.result.other" content="# matching documents"> <meta name="lang:search.tokenizer" content="[\s\-]+"> <link href="https://fonts.gstatic.com/" rel=preconnect  crossorigin> <link href="https://fonts.googleapis.com/css?family=Roboto+Mono:400,500,700|Roboto:300,400,400i,700&display=fallback" rel=stylesheet > <style> body, input { font-family: "Roboto", "Helvetica Neue", Helvetica, Arial, sans-serif } code, kbd, pre { font-family: "Roboto Mono", "Courier New", Courier, monospace } </style> <link rel=stylesheet  href="_static/stylesheets/application.css"/> <link rel=stylesheet  href="_static/stylesheets/application-palette.css"/> <link rel=stylesheet  href="_static/stylesheets/application-fixes.css"/> <link rel=stylesheet  href="_static/fonts/material-icons.css"/> <meta name=theme-color  content="#3f51b5"> <script src="_static/javascripts/modernizr.js"></script> <script async src="https://www.googletagmanager.com/gtag/js?id=UA-XXXXX"></script> <script> window.dataLayer = window.dataLayer || []; function gtag() { dataLayer.push(arguments); } gtag('js', new Date()); gtag('config', 'UA-XXXXX'); </script> <title>Hardware Architecture &#8212; ZigZag 2.0.0 documentation</title> <link rel=stylesheet  type="text/css" href="_static/pygments.css?v=83e35b93" /> <link rel=stylesheet  type="text/css" href="_static/material.css?v=79c92029" /> <script data-url_root="./" id=documentation_options  src="_static/documentation_options.js?v=73cda6fb"></script> <script src="_static/doctools.js?v=888ff710"></script> <script src="_static/sphinx_highlight.js?v=4825356b"></script> <link rel=icon  href="_static/zigzag_logo_white_32x32.svg"/> <link rel=index  title=Index  href=genindex.html  /> <link rel=search  title=Search  href=search.html  /> <link rel=next  title=Mapping  href=mapping.html  /> <link rel=prev  title=Workload  href=workload.html  /> <body dir=ltr data-md-color-primary=blue-grey data-md-color-accent=grey> <svg class=md-svg > <defs data-children-count=0 > <svg xmlns="http://www.w3.org/2000/svg" width=416  height=448  viewBox="0 0 416 448" id=__github ><path fill=currentColor  d="M160 304q0 10-3.125 20.5t-10.75 19T128 352t-18.125-8.5-10.75-19T96 304t3.125-20.5 10.75-19T128 256t18.125 8.5 10.75 19T160 304zm160 0q0 10-3.125 20.5t-10.75 19T288 352t-18.125-8.5-10.75-19T256 304t3.125-20.5 10.75-19T288 256t18.125 8.5 10.75 19T320 304zm40 0q0-30-17.25-51T296 232q-10.25 0-48.75 5.25Q229.5 240 208 240t-39.25-2.75Q130.75 232 120 232q-29.5 0-46.75 21T56 304q0 22 8 38.375t20.25 25.75 30.5 15 35 7.375 37.25 1.75h42q20.5 0 37.25-1.75t35-7.375 30.5-15 20.25-25.75T360 304zm56-44q0 51.75-15.25 82.75-9.5 19.25-26.375 33.25t-35.25 21.5-42.5 11.875-42.875 5.5T212 416q-19.5 0-35.5-.75t-36.875-3.125-38.125-7.5-34.25-12.875T37 371.5t-21.5-28.75Q0 312 0 260q0-59.25 34-99-6.75-20.5-6.75-42.5 0-29 12.75-54.5 27 0 47.5 9.875t47.25 30.875Q171.5 96 212 96q37 0 70 8 26.25-20.5 46.75-30.25T376 64q12.75 25.5 12.75 54.5 0 21.75-6.75 42 34 40 34 99.5z"/></svg> </defs> </svg> <input class=md-toggle  data-md-toggle=drawer  type=checkbox  id=__drawer > <input class=md-toggle  data-md-toggle=search  type=checkbox  id=__search > <label class=md-overlay  data-md-component=overlay  for=__drawer ></label> <a href="#hardware" tabindex=1  class=md-skip > Skip to content </a> <header class=md-header  data-md-component=header > <nav class="md-header-nav md-grid"> <div class="md-flex navheader"> <div class="md-flex__cell md-flex__cell--shrink"> <a href=index.html  title="ZigZag 2.0.0 documentation" class="md-header-nav__button md-logo"> &nbsp; </a> </div> <div class="md-flex__cell md-flex__cell--shrink"> <label class="md-icon md-icon--menu md-header-nav__button" for=__drawer ></label> </div> <div class="md-flex__cell md-flex__cell--stretch"> <div class="md-flex__ellipsis md-header-nav__title" data-md-component=title > <span class=md-header-nav__topic >ZigZag Framework</span> <span class=md-header-nav__topic > Hardware Architecture </span> </div> </div> <div class="md-flex__cell md-flex__cell--shrink"> <label class="md-icon md-icon--search md-header-nav__button" for=__search ></label> <div class=md-search  data-md-component=search  role=dialog > <label class=md-search__overlay  for=__search ></label> <div class=md-search__inner  role=search > <form class=md-search__form  action=search.html  method=get  name=search > <input type=text  class=md-search__input  name=q  placeholder=""Search"" autocapitalize=off  autocomplete=off  spellcheck=false  data-md-component=query  data-md-state=active > <label class="md-icon md-search__icon" for=__search ></label> <button type=reset  class="md-icon md-search__icon" data-md-component=reset  tabindex=-1 > &#xE5CD; </button> </form> <div class=md-search__output > <div class=md-search__scrollwrap  data-md-scrollfix> <div class=md-search-result  data-md-component=result > <div class=md-search-result__meta > Type to start searching </div> <ol class=md-search-result__list ></ol> </div> </div> </div> </div> </div> </div> <div class="md-flex__cell md-flex__cell--shrink"> <div class=md-header-nav__source > <a href="https://github.com/kuleuven-micas/zigzag" title="Go to repository" class=md-source  data-md-source=github > <div class=md-source__icon > <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 24 24" width=28  height=28 > <use xlink:href="#__github" width=24  height=24 ></use> </svg> </div> <div class=md-source__repository > ZigZag Framework </div> </a> </div> </div> <script src="_static/javascripts/version_dropdown.js"></script> <script> var json_loc = ""versions.json"", target_loc = "../", text = "Versions"; $( document ).ready( add_version_dropdown(json_loc, target_loc, text)); </script> </div> </nav> </header> <div class=md-container > <nav class=md-tabs  data-md-component=tabs > <div class="md-tabs__inner md-grid"> <ul class=md-tabs__list > <li class=md-tabs__item ><a href=index.html  class=md-tabs__link >ZigZag 2.0.0 documentation</a> <li class=md-tabs__item ><a href=user-guide.html  class=md-tabs__link >User Guide</a> </ul> </div> </nav> <main class=md-main > <div class="md-main__inner md-grid" data-md-component=container > <div class="md-sidebar md-sidebar--primary" data-md-component=navigation > <div class=md-sidebar__scrollwrap > <div class=md-sidebar__inner > <nav class="md-nav md-nav--primary" data-md-level=0 > <label class="md-nav__title md-nav__title--site" for=__drawer > <a href=index.html  title="ZigZag 2.0.0 documentation" class="md-nav__button md-logo"> <img src="_static/" alt=" logo" width=48  height=48 > </a> <a href=index.html  title="ZigZag 2.0.0 documentation">ZigZag Framework</a> </label> <div class=md-nav__source > <a href="https://github.com/kuleuven-micas/zigzag" title="Go to repository" class=md-source  data-md-source=github > <div class=md-source__icon > <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 24 24" width=28  height=28 > <use xlink:href="#__github" width=24  height=24 ></use> </svg> </div> <div class=md-source__repository > ZigZag Framework </div> </a> </div> <ul class=md-nav__list > <li class=md-nav__item > <span class="md-nav__link caption"><span class=caption-text >Contents:</span></span> <li class=md-nav__item > <a href=installation.html  class=md-nav__link >Installing ZigZag</a> <li class=md-nav__item > <a href=getting-started.html  class=md-nav__link >Getting Started</a> <li class=md-nav__item > <a href=api.html  class=md-nav__link >ZigZag API</a> <li class=md-nav__item > <a href=user-guide.html  class=md-nav__link >User Guide</a> <ul class=md-nav__list > <li class=md-nav__item > <a href=workload.html  class=md-nav__link >Workload</a> <li class=md-nav__item > <input class="md-toggle md-nav__toggle" data-md-toggle=toc  type=checkbox  id=__toc > <label class="md-nav__link md-nav__link--active" for=__toc > Hardware Architecture </label> <a href="#" class="md-nav__link md-nav__link--active">Hardware Architecture</a> <nav class="md-nav md-nav--secondary"> <label class=md-nav__title  for=__toc >"Contents"</label> <ul class=md-nav__list  data-md-scrollfix=""> <li class=md-nav__item ><a href="#hardware--page-root" class=md-nav__link >Hardware Architecture</a><nav class=md-nav > <ul class=md-nav__list > <li class=md-nav__item ><a href="#operational-unit" class=md-nav__link >Operational Unit</a> <li class=md-nav__item ><a href="#operational-array" class=md-nav__link >Operational Array</a> <li class=md-nav__item ><a href="#memory-instance" class=md-nav__link >Memory Instance</a> <li class=md-nav__item ><a href="#memory-hierarchy" class=md-nav__link >Memory Hierarchy</a> <li class=md-nav__item ><a href="#core" class=md-nav__link >Core</a> <li class=md-nav__item ><a href="#hw-accelerator-model" class=md-nav__link >HW Accelerator Model</a> <li class=md-nav__item ><a href="#modelled-examples" class=md-nav__link >Modelled examples</a> <li class=md-nav__item ><a href="#specific-settings" class=md-nav__link >Specific settings</a> <li class=md-nav__item ><a href="#references" class=md-nav__link >References</a> </ul> </nav> <li class=md-nav__item ><a class=md-nav__extra_link  href="_sources/hardware.rst.txt">Show Source</a> </ul> </nav> <li class=md-nav__item > <a href=mapping.html  class=md-nav__link >Mapping</a> <li class=md-nav__item > <a href=stages.html  class=md-nav__link >Stages</a> <li class=md-nav__item > <a href=outputs.html  class=md-nav__link >Outputs</a> </ul> <li class=md-nav__item > <a href=future.html  class=md-nav__link >Future changes</a> <li class=md-nav__item > <a href=contribute.html  class=md-nav__link >Contribute</a> <li class=md-nav__item > <a href=publications.html  class=md-nav__link >Publications</a> <li class=md-nav__item > <a href=code-documentation.html  class=md-nav__link >Code Documentation</a> </ul> </nav> </div> </div> </div> <div class="md-sidebar md-sidebar--secondary" data-md-component=toc > <div class=md-sidebar__scrollwrap > <div class=md-sidebar__inner > <nav class="md-nav md-nav--secondary"> <label class=md-nav__title  for=__toc >"Contents"</label> <ul class=md-nav__list  data-md-scrollfix=""> <li class=md-nav__item ><a href="#hardware--page-root" class=md-nav__link >Hardware Architecture</a><nav class=md-nav > <ul class=md-nav__list > <li class=md-nav__item ><a href="#operational-unit" class=md-nav__link >Operational Unit</a> <li class=md-nav__item ><a href="#operational-array" class=md-nav__link >Operational Array</a> <li class=md-nav__item ><a href="#memory-instance" class=md-nav__link >Memory Instance</a> <li class=md-nav__item ><a href="#memory-hierarchy" class=md-nav__link >Memory Hierarchy</a> <li class=md-nav__item ><a href="#core" class=md-nav__link >Core</a> <li class=md-nav__item ><a href="#hw-accelerator-model" class=md-nav__link >HW Accelerator Model</a> <li class=md-nav__item ><a href="#modelled-examples" class=md-nav__link >Modelled examples</a> <li class=md-nav__item ><a href="#specific-settings" class=md-nav__link >Specific settings</a> <li class=md-nav__item ><a href="#references" class=md-nav__link >References</a> </ul> </nav> <li class=md-nav__item ><a class=md-nav__extra_link  href="_sources/hardware.rst.txt">Show Source</a> <li id=searchbox  class=md-nav__item > </ul> </nav> </div> </div> </div> <div class=md-content > <article class="md-content__inner md-typeset" role=main > <section id=hardware-architecture > <h1 id=hardware--page-root >Hardware Architecture<a class=headerlink  href="#hardware--page-root" title="Permalink to this heading">¶</a></h1> <p>In this section we introduce the general concept of how HW accelerators are modelled within ZigZag and the different well-known accelerators we provide as examples. We start from the smallest building block defined in ZigZag and work our way up towards an accelerator.</p> <section id=operational-unit > <h2 id=operational-unit >Operational Unit<a class=headerlink  href="#operational-unit" title="Permalink to this heading">¶</a></h2> <p>Accelerating inference of a NN requires execution of multiplications and summations (accumulations) across multiple intermediate data (activations) using trained parameters (weights). The operational unit, typically a Multiplier, executes the multiplication of two data elements, typically an activation and a weight.</p> <a class="reference internal image-reference" href="_images/operational-unit.jpg"><img alt="_images/operational-unit.jpg" src="_images/operational-unit.jpg" style="width: 400px;"/></a> <p>The operational unit object has following attributes:</p> <ul class=simple > <li><p><strong>input_precision</strong>: List of input operand (data) precision in number of bits for each input operand (typically 2 for Multiplier).</p> <li><p><strong>output_precision</strong>: The bit precision of the operation’s output.</p> <li><p><strong>energy_cost</strong>: Energy of executing a single multiplication.</p> <li><p><strong>area</strong>: The HW area overhead of a single multiplier.</p> </ul> </section> <section id=operational-array > <h2 id=operational-array >Operational Array<a class=headerlink  href="#operational-array" title="Permalink to this heading">¶</a></h2> <p>Inferencing a NN typically requires millions of operations, and an accelerator typically includes an array of operational units that can execute these operations. This can speed significantly up the computations, as well as increase energy efficiency which is covered later.</p> <p>The array has multiple dimensions, each with a size. The importance of these dimensions is explained in the introduction of the memory hierarchy.</p> <a class="reference internal image-reference" href="_images/operational-array.jpg"><img alt="_images/operational-array.jpg" src="_images/operational-array.jpg" style="width: 400px;"/></a> <p>The operational array object has:</p> <ul class=simple > <li><p><strong>operational_unit</strong>: The operational unit from which the array is built.</p> <li><p><strong>dimensions</strong>: The dimensions of the array. This should be defined as a dict, with the keys being the identifier of each dimension of the array (typically ‘D1’, ‘D2, …) and the values being the size of this dimension (i.e. the size of the array along that dimension).</p> </ul> </section> <section id=memory-instance > <h2 id=memory-instance >Memory Instance<a class=headerlink  href="#memory-instance" title="Permalink to this heading">¶</a></h2> <p>In order to store the different activations and weights used for the computations in the operational array, different memory instances are attached in a hierarchical fashion. The instances define how big each memory is in terms of capacity and area overhead, what the cost of writing and reading from these memories is, what it’s bandwidth is, and how many read/write/read-write ports it includes.</p> <a class="reference internal image-reference" href="_images/memory-instance.jpg"><img alt="_images/memory-instance.jpg" src="_images/memory-instance.jpg" style="width: 400px;"/></a> <p>The memory instance object has:</p> <ul class=simple > <li><p><strong>name</strong>: A name for the instance</p> <li><p><strong>size</strong>: The memory size in bits.</p> <li><p><strong>r_bw/w_bw</strong>: A read and write bandwidth in number of bits per cycle.</p> <li><p><strong>r_cost/w_cost</strong>: A read and write energy cost.</p> <li><p><strong>area</strong>: Area overhead of the instance.</p> <li><p><strong>r_port/w_port/rw_port</strong>: The number of read/write/read-write ports the instance has available.</p> <li><p><strong>latency</strong>: The latency of an access in number of cycles.</p> </ul> </section> <section id=memory-hierarchy > <h2 id=memory-hierarchy >Memory Hierarchy<a class=headerlink  href="#memory-hierarchy" title="Permalink to this heading">¶</a></h2> <p>Besides knowing what the specs of each memory instance are, the memory hierarchy encodes information with respect to the interconnection of the memories to the operational array, and to the other memory instances. This interconnection is achieved through multiple calls to the <cite>add_memory()</cite>, where the first call(s) adds the first level of memories, which connects to the operational array, and later calls connect to the lower memory levels. This builds a hierarchy of memories.</p> <p>To know if the memory should connect to the operational array or another lower memory level, it needs to know which data will be stored within the memories. To decouple the algorithmic side from the hardware side, this is achieved through the concept of ‘memory operands’ (as opposed to ‘algorithmic operands which are typicall the I/O activations and weights W). You can think of the memory operands as virtual operands, which will later be linked to the actual algorithmic operands in the mapping file through the <cite>memory_operand_links</cite> attribute.</p> <p>Similarly to how the operational unit can be unrolled (forming an operational array), the memories can also be unrolled, where each memory accompanies either a single operational unit or all the operational units in one or more dimensions of the operational array. This is encoded through the <cite>served_dimensions</cite> attribute, which specifies if a single memory instance of this memory level serves all operational units in that dimension. This should be a set of one-hot-encoded tuples.</p> <p>Lastly, the different read/write/read-write ports a memory instance has, are assigned to the different data movevements possible in the hierarchy. There are four types of data movements in a hierarchy: from high (<em>fh</em>), to high (<em>th</em>), from low (<em>fl</em>), to low (<em>tl</em>). At the time of writing, these can be manually linked to one of the read/write/read-write ports through the following syntax: <cite>{port_type}_port_{port_number}</cite>, <em>port_type</em> being <em>r</em>, <em>w</em> or <em>rw</em> and <em>port_number</em> equal to the port number, starting from 1, which allows to allocate multiple ports of the same type. Alternatively, these are automatically generated as a default if not probided to the <cite>add_memory()</cite> call.</p> <p>Internally, the MemoryHierarchy object extends the <a class="reference external" href="https://networkx.org/documentation/stable/reference/classes/digraph.html">NetworkX DiGraph</a> object, so its methods are available.</p> <a class="reference internal image-reference" href="_images/memory-hierarchy.jpg"><img alt="_images/memory-hierarchy.jpg" src="_images/memory-hierarchy.jpg" style="width: 800px;"/></a> <p>The memory hierarchy object includes:</p> <ul class=simple > <li><p><strong>operational_array</strong>: The operational array to which this memory hierarchy will connect. This is required to correctly infer the interconnection through the operational array’s dimensions. Through the <cite>add_memory()</cite> calls it adds a new MemoryLevel to the graph. This requires for each call a:</p> <li><p><strong>memory_instance</strong>: A MemoryInstance object you are adding to the hierarchy.</p> <li><p><strong>operands</strong>: The virtual memory operands this MemoryLevel stores.</p> <li><p><strong>port_alloc</strong>: The directionality of the memory instance’s different ports, as described above.</p> <li><p><strong>served_dimensions</strong>: The different dimensions that this memory level will serve, as described above.</p> </ul> </section> <section id=core > <h2 id=core >Core<a class=headerlink  href="#core" title="Permalink to this heading">¶</a></h2> <p>The operational array and the memory hierarchy together form a core of the accelerator.</p> <a class="reference internal image-reference" href="_images/core.jpg"><img alt="_images/core.jpg" src="_images/core.jpg" style="width: 400px;"/></a> <p>The core object includes:</p> <ul class=simple > <li><p><strong>id</strong>: The id of this core.</p> <li><p><strong>operational_array</strong>: The operational array of this core.</p> <li><p><strong>memory_hierarchy</strong>: The memory hierarchy of this core.</p> </ul> </section> <section id=hw-accelerator-model > <h2 id=hw-accelerator-model >HW Accelerator Model<a class=headerlink  href="#hw-accelerator-model" title="Permalink to this heading">¶</a></h2> <p>Multiple cores are combined together into the HW Accelerator, which is the main object modelling the HW behaviour.</p> <p>The accelerator object includes:</p> <ul class=simple > <li><p><strong>name</strong>: A user-defined name for this accelerator.</p> <li><p><strong>core_set</strong>: The set of cores comprised within the accelerator.</p> <li><p><strong>global_buffer</strong>: A memory instance shared across cores. This is currently un-used.</p> </ul> </section> <section id=modelled-examples > <h2 id=modelled-examples >Modelled examples<a class=headerlink  href="#modelled-examples" title="Permalink to this heading">¶</a></h2> <p>In this repository, we have modeled 5 well-known DNN accelerators, which are Meta prototype [1], TPU [2], Edge TPU [3], Ascend [4], Tesla NPU [5], and, for our depth-first scheduling research. To make a fair and relevant comparison, we normalized all of them to have 1024 MACs and maximally 2MB global buffer (GB) but kept their spatial unrolling and local buffer settings, as shown in Table I Idx 1/3/5/7/9. Besides, we constructed a variant of every normalized architecture (by changing its on-chip memory hierarchy), denoted with ‘DF’ in the end of the name, as shown in Table I Idx 2/4/6/8/10.</p> </section> <section id=specific-settings > <h2 id=specific-settings >Specific settings<a class=headerlink  href="#specific-settings" title="Permalink to this heading">¶</a></h2> <a class="reference internal image-reference" href="https://user-images.githubusercontent.com/55059827/183848886-c85b9950-5e49-47c9-8a47-ad05062debc3.png"><img alt="Alternative text" src="https://user-images.githubusercontent.com/55059827/183848886-c85b9950-5e49-47c9-8a47-ad05062debc3.png" style="width: 800px;"/></a> <div class="admonition note"> <p class=admonition-title >Note</p> <p>K is for output channel; C is for input channel; OX and OY are the output feature map’s spatial dimensions; FX and FY are the weight’s spatial dimensions.</p> </div> </section> <section id=references > <h2 id=references >References<a class=headerlink  href="#references" title="Permalink to this heading">¶</a></h2> <p>[1] H. E. Sumbul, T. F. Wu, Y. Li, S. S. Sarwar, W. Koven, E. Murphy- Trotzky, X. Cai, E. Ansari, D. H. Morris, H. Liu, D. Kim, E. Beigne, R. Labs, and Meta, “System-level design and integration of a prototype ar/vr hardware featuring a custom low-power dnn accelerator chip in 7nm technology for codec avatars,” in 2022 IEEE Custom Integrated Circuits Conference (CICC), 2022, pp. 01–08.</p> <p>[2] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon, “In-datacenter performance analysis of a tensor processing unit,” SIGARCH Comput. Archit. News, vol. 45, no. 2, p. 1–12, jun 2017.</p> <p>[3] A. Yazdanbakhsh, K. Seshadri, B. Akin, J. Laudon, and R. Narayanaswami, “An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks,” arXiv e-prints, p. arXiv:2102.10423, Feb. 2021.</p> <p>[4] H. Liao, J. Tu, J. Xia, H. Liu, X. Zhou, H. Yuan, and Y. Hu, “Ascend: a scalable and unified architecture for ubiquitous deep neural network computing : Industry track paper,” in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 789–801.</p> <p>[5] E. Talpes, D. D. Sarma, G. Venkataramanan, P. Bannon, B. McGee, B. Floering, A. Jalote, C. Hsiong, S. Arora, A. Gorti, and G. S. Sachdev, “Compute solution for tesla’s full self-driving computer,” IEEE Micro, vol. 40, no. 2, pp. 25–35, 2020.</p> </section> </section> </article> </div> </div> </main> </div> <footer class=md-footer > <div class=md-footer-nav > <nav class="md-footer-nav__inner md-grid"> <a href=workload.html  title=Workload  class="md-flex md-footer-nav__link md-footer-nav__link--prev" rel=prev > <div class="md-flex__cell md-flex__cell--shrink"> <i class="md-icon md-icon--arrow-back md-footer-nav__button"></i> </div> <div class="md-flex__cell md-flex__cell--stretch md-footer-nav__title"> <span class=md-flex__ellipsis > <span class=md-footer-nav__direction > "Previous" </span> Workload </span> </div> </a> <a href=mapping.html  title=Mapping  class="md-flex md-footer-nav__link md-footer-nav__link--next" rel=next > <div class="md-flex__cell md-flex__cell--stretch md-footer-nav__title"><span class=md-flex__ellipsis > <span class=md-footer-nav__direction > "Next" </span> Mapping </span> </div> <div class="md-flex__cell md-flex__cell--shrink"><i class="md-icon md-icon--arrow-forward md-footer-nav__button"></i> </div> </a> </nav> </div> <div class="md-footer-meta md-typeset"> <div class="md-footer-meta__inner md-grid"> <div class=md-footer-copyright > <div class=md-footer-copyright__highlight > &#169; Copyright 2022, Arne Symons. </div> Created using <a href="http://www.sphinx-doc.org/">Sphinx</a> 7.1.2. and <a href="https://github.com/bashtage/sphinx-material/">Material for Sphinx</a> </div> </div> </div> </footer> <script src="_static/javascripts/application.js"></script> <script>app.initialize({version: "1.0.4", url: {base: ".."}})</script>
\ No newline at end of file
+<!DOCTYPE html> <html lang=en > <meta charset=utf-8  /> <meta name=viewport  content="width=device-width, initial-scale=1.0" /><meta name=viewport  content="width=device-width, initial-scale=1" /> <meta name=viewport  content="width=device-width,initial-scale=1"> <meta http-equiv=x-ua-compatible  content="ie=edge"> <meta name="lang:clipboard.copy" content="Copy to clipboard"> <meta name="lang:clipboard.copied" content="Copied to clipboard"> <meta name="lang:search.language" content=en > <meta name="lang:search.pipeline.stopwords" content=True > <meta name="lang:search.pipeline.trimmer" content=True > <meta name="lang:search.result.none" content="No matching documents"> <meta name="lang:search.result.one" content="1 matching document"> <meta name="lang:search.result.other" content="# matching documents"> <meta name="lang:search.tokenizer" content="[\s\-]+"> <link href="https://fonts.gstatic.com/" rel=preconnect  crossorigin> <link href="https://fonts.googleapis.com/css?family=Roboto+Mono:400,500,700|Roboto:300,400,400i,700&display=fallback" rel=stylesheet > <style> body, input { font-family: "Roboto", "Helvetica Neue", Helvetica, Arial, sans-serif } code, kbd, pre { font-family: "Roboto Mono", "Courier New", Courier, monospace } </style> <link rel=stylesheet  href="_static/stylesheets/application.css"/> <link rel=stylesheet  href="_static/stylesheets/application-palette.css"/> <link rel=stylesheet  href="_static/stylesheets/application-fixes.css"/> <link rel=stylesheet  href="_static/fonts/material-icons.css"/> <meta name=theme-color  content="#3f51b5"> <script src="_static/javascripts/modernizr.js"></script> <script async src="https://www.googletagmanager.com/gtag/js?id=UA-XXXXX"></script> <script> window.dataLayer = window.dataLayer || []; function gtag() { dataLayer.push(arguments); } gtag('js', new Date()); gtag('config', 'UA-XXXXX'); </script> <title>Hardware Architecture &#8212; ZigZag 2.0.0 documentation</title> <link rel=stylesheet  type="text/css" href="_static/pygments.css?v=83e35b93" /> <link rel=stylesheet  type="text/css" href="_static/material.css?v=79c92029" /> <script data-url_root="./" id=documentation_options  src="_static/documentation_options.js?v=73cda6fb"></script> <script src="_static/doctools.js?v=888ff710"></script> <script src="_static/sphinx_highlight.js?v=4825356b"></script> <link rel=icon  href="_static/zigzag_logo_white_32x32.svg"/> <link rel=index  title=Index  href=genindex.html  /> <link rel=search  title=Search  href=search.html  /> <link rel=next  title=Mapping  href=mapping.html  /> <link rel=prev  title=Workload  href=workload.html  /> <body dir=ltr data-md-color-primary=blue-grey data-md-color-accent=grey> <svg class=md-svg > <defs data-children-count=0 > <svg xmlns="http://www.w3.org/2000/svg" width=416  height=448  viewBox="0 0 416 448" id=__github ><path fill=currentColor  d="M160 304q0 10-3.125 20.5t-10.75 19T128 352t-18.125-8.5-10.75-19T96 304t3.125-20.5 10.75-19T128 256t18.125 8.5 10.75 19T160 304zm160 0q0 10-3.125 20.5t-10.75 19T288 352t-18.125-8.5-10.75-19T256 304t3.125-20.5 10.75-19T288 256t18.125 8.5 10.75 19T320 304zm40 0q0-30-17.25-51T296 232q-10.25 0-48.75 5.25Q229.5 240 208 240t-39.25-2.75Q130.75 232 120 232q-29.5 0-46.75 21T56 304q0 22 8 38.375t20.25 25.75 30.5 15 35 7.375 37.25 1.75h42q20.5 0 37.25-1.75t35-7.375 30.5-15 20.25-25.75T360 304zm56-44q0 51.75-15.25 82.75-9.5 19.25-26.375 33.25t-35.25 21.5-42.5 11.875-42.875 5.5T212 416q-19.5 0-35.5-.75t-36.875-3.125-38.125-7.5-34.25-12.875T37 371.5t-21.5-28.75Q0 312 0 260q0-59.25 34-99-6.75-20.5-6.75-42.5 0-29 12.75-54.5 27 0 47.5 9.875t47.25 30.875Q171.5 96 212 96q37 0 70 8 26.25-20.5 46.75-30.25T376 64q12.75 25.5 12.75 54.5 0 21.75-6.75 42 34 40 34 99.5z"/></svg> </defs> </svg> <input class=md-toggle  data-md-toggle=drawer  type=checkbox  id=__drawer > <input class=md-toggle  data-md-toggle=search  type=checkbox  id=__search > <label class=md-overlay  data-md-component=overlay  for=__drawer ></label> <a href="#hardware" tabindex=1  class=md-skip > Skip to content </a> <header class=md-header  data-md-component=header > <nav class="md-header-nav md-grid"> <div class="md-flex navheader"> <div class="md-flex__cell md-flex__cell--shrink"> <a href=index.html  title="ZigZag 2.0.0 documentation" class="md-header-nav__button md-logo"> &nbsp; </a> </div> <div class="md-flex__cell md-flex__cell--shrink"> <label class="md-icon md-icon--menu md-header-nav__button" for=__drawer ></label> </div> <div class="md-flex__cell md-flex__cell--stretch"> <div class="md-flex__ellipsis md-header-nav__title" data-md-component=title > <span class=md-header-nav__topic >ZigZag Framework</span> <span class=md-header-nav__topic > Hardware Architecture </span> </div> </div> <div class="md-flex__cell md-flex__cell--shrink"> <label class="md-icon md-icon--search md-header-nav__button" for=__search ></label> <div class=md-search  data-md-component=search  role=dialog > <label class=md-search__overlay  for=__search ></label> <div class=md-search__inner  role=search > <form class=md-search__form  action=search.html  method=get  name=search > <input type=text  class=md-search__input  name=q  placeholder=""Search"" autocapitalize=off  autocomplete=off  spellcheck=false  data-md-component=query  data-md-state=active > <label class="md-icon md-search__icon" for=__search ></label> <button type=reset  class="md-icon md-search__icon" data-md-component=reset  tabindex=-1 > &#xE5CD; </button> </form> <div class=md-search__output > <div class=md-search__scrollwrap  data-md-scrollfix> <div class=md-search-result  data-md-component=result > <div class=md-search-result__meta > Type to start searching </div> <ol class=md-search-result__list ></ol> </div> </div> </div> </div> </div> </div> <div class="md-flex__cell md-flex__cell--shrink"> <div class=md-header-nav__source > <a href="https://github.com/kuleuven-micas/zigzag" title="Go to repository" class=md-source  data-md-source=github > <div class=md-source__icon > <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 24 24" width=28  height=28 > <use xlink:href="#__github" width=24  height=24 ></use> </svg> </div> <div class=md-source__repository > ZigZag Framework </div> </a> </div> </div> <script src="_static/javascripts/version_dropdown.js"></script> <script> var json_loc = ""versions.json"", target_loc = "../", text = "Versions"; $( document ).ready( add_version_dropdown(json_loc, target_loc, text)); </script> </div> </nav> </header> <div class=md-container > <nav class=md-tabs  data-md-component=tabs > <div class="md-tabs__inner md-grid"> <ul class=md-tabs__list > <li class=md-tabs__item ><a href=index.html  class=md-tabs__link >ZigZag 2.0.0 documentation</a> <li class=md-tabs__item ><a href=user-guide.html  class=md-tabs__link >User Guide</a> </ul> </div> </nav> <main class=md-main > <div class="md-main__inner md-grid" data-md-component=container > <div class="md-sidebar md-sidebar--primary" data-md-component=navigation > <div class=md-sidebar__scrollwrap > <div class=md-sidebar__inner > <nav class="md-nav md-nav--primary" data-md-level=0 > <label class="md-nav__title md-nav__title--site" for=__drawer > <a href=index.html  title="ZigZag 2.0.0 documentation" class="md-nav__button md-logo"> <img src="_static/" alt=" logo" width=48  height=48 > </a> <a href=index.html  title="ZigZag 2.0.0 documentation">ZigZag Framework</a> </label> <div class=md-nav__source > <a href="https://github.com/kuleuven-micas/zigzag" title="Go to repository" class=md-source  data-md-source=github > <div class=md-source__icon > <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 24 24" width=28  height=28 > <use xlink:href="#__github" width=24  height=24 ></use> </svg> </div> <div class=md-source__repository > ZigZag Framework </div> </a> </div> <ul class=md-nav__list > <li class=md-nav__item > <span class="md-nav__link caption"><span class=caption-text >Contents:</span></span> <li class=md-nav__item > <a href=installation.html  class=md-nav__link >Installing ZigZag</a> <li class=md-nav__item > <a href=getting-started.html  class=md-nav__link >Getting Started</a> <li class=md-nav__item > <a href=api.html  class=md-nav__link >ZigZag API</a> <li class=md-nav__item > <a href=user-guide.html  class=md-nav__link >User Guide</a> <ul class=md-nav__list > <li class=md-nav__item > <a href=workload.html  class=md-nav__link >Workload</a> <li class=md-nav__item > <input class="md-toggle md-nav__toggle" data-md-toggle=toc  type=checkbox  id=__toc > <label class="md-nav__link md-nav__link--active" for=__toc > Hardware Architecture </label> <a href="#" class="md-nav__link md-nav__link--active">Hardware Architecture</a> <nav class="md-nav md-nav--secondary"> <label class=md-nav__title  for=__toc >"Contents"</label> <ul class=md-nav__list  data-md-scrollfix=""> <li class=md-nav__item ><a href="#hardware--page-root" class=md-nav__link >Hardware Architecture</a><nav class=md-nav > <ul class=md-nav__list > <li class=md-nav__item ><a href="#operational-unit" class=md-nav__link >Operational Unit</a> <li class=md-nav__item ><a href="#operational-array" class=md-nav__link >Operational Array</a> <li class=md-nav__item ><a href="#memory-instance" class=md-nav__link >Memory Instance</a> <li class=md-nav__item ><a href="#memory-hierarchy" class=md-nav__link >Memory Hierarchy</a> <li class=md-nav__item ><a href="#core" class=md-nav__link >Core</a> <li class=md-nav__item ><a href="#hw-accelerator-model" class=md-nav__link >HW Accelerator Model</a> <li class=md-nav__item ><a href="#modelled-examples" class=md-nav__link >Modelled examples</a> <li class=md-nav__item ><a href="#specific-settings" class=md-nav__link >Specific settings</a> <li class=md-nav__item ><a href="#references" class=md-nav__link >References</a> </ul> </nav> <li class=md-nav__item ><a class=md-nav__extra_link  href="_sources/hardware.rst.txt">Show Source</a> </ul> </nav> <li class=md-nav__item > <a href=mapping.html  class=md-nav__link >Mapping</a> <li class=md-nav__item > <a href=stages.html  class=md-nav__link >Stages</a> <li class=md-nav__item > <a href=outputs.html  class=md-nav__link >Outputs</a> </ul> <li class=md-nav__item > <a href=future.html  class=md-nav__link >Future changes</a> <li class=md-nav__item > <a href=contribute.html  class=md-nav__link >Contribute</a> <li class=md-nav__item > <a href=publications.html  class=md-nav__link >Publications</a> <li class=md-nav__item > <a href=code-documentation.html  class=md-nav__link >Code Documentation</a> </ul> </nav> </div> </div> </div> <div class="md-sidebar md-sidebar--secondary" data-md-component=toc > <div class=md-sidebar__scrollwrap > <div class=md-sidebar__inner > <nav class="md-nav md-nav--secondary"> <label class=md-nav__title  for=__toc >"Contents"</label> <ul class=md-nav__list  data-md-scrollfix=""> <li class=md-nav__item ><a href="#hardware--page-root" class=md-nav__link >Hardware Architecture</a><nav class=md-nav > <ul class=md-nav__list > <li class=md-nav__item ><a href="#operational-unit" class=md-nav__link >Operational Unit</a> <li class=md-nav__item ><a href="#operational-array" class=md-nav__link >Operational Array</a> <li class=md-nav__item ><a href="#memory-instance" class=md-nav__link >Memory Instance</a> <li class=md-nav__item ><a href="#memory-hierarchy" class=md-nav__link >Memory Hierarchy</a> <li class=md-nav__item ><a href="#core" class=md-nav__link >Core</a> <li class=md-nav__item ><a href="#hw-accelerator-model" class=md-nav__link >HW Accelerator Model</a> <li class=md-nav__item ><a href="#modelled-examples" class=md-nav__link >Modelled examples</a> <li class=md-nav__item ><a href="#specific-settings" class=md-nav__link >Specific settings</a> <li class=md-nav__item ><a href="#references" class=md-nav__link >References</a> </ul> </nav> <li class=md-nav__item ><a class=md-nav__extra_link  href="_sources/hardware.rst.txt">Show Source</a> <li id=searchbox  class=md-nav__item > </ul> </nav> </div> </div> </div> <div class=md-content > <article class="md-content__inner md-typeset" role=main > <section id=hardware-architecture > <h1 id=hardware--page-root >Hardware Architecture<a class=headerlink  href="#hardware--page-root" title="Permalink to this heading">¶</a></h1> <p>In this section, we introduce the general concept of how HW accelerators are modeled within ZigZag and the different well-known accelerators we provide as examples. We start from the smallest building block defined in ZigZag and work our way up towards an accelerator.</p> <section id=operational-unit > <h2 id=operational-unit >Operational Unit<a class=headerlink  href="#operational-unit" title="Permalink to this heading">¶</a></h2> <p>Accelerating inference of a NN requires the execution of multiplications and summations (accumulations) across multiple intermediate data (activations) using trained parameters (weights). The operational unit, typically a Multiplier, executes the multiplication of two data elements, typically an activation and a weight.</p> <a class="reference internal image-reference" href="_images/operational-unit.jpg"><img alt="_images/operational-unit.jpg" src="_images/operational-unit.jpg" style="width: 400px;"/></a> <p>The operational unit object has the following attributes:</p> <ul class=simple > <li><p><strong>input_precision</strong>: List of input operand (data) precision in the number of bits for each input operand (typically there are two input operands for a Multiplier).</p> <li><p><strong>output_precision</strong>: The bit precision of the operation’s output.</p> <li><p><strong>energy_cost</strong>: Energy of executing a single operation (e.g., a multiplication).</p> <li><p><strong>area</strong>: The HW area overhead of a single operational unit (e.g., a multiplier).</p> </ul> </section> <section id=operational-array > <h2 id=operational-array >Operational Array<a class=headerlink  href="#operational-array" title="Permalink to this heading">¶</a></h2> <p>Inferencing a NN typically requires millions of operations, and an accelerator typically includes an array of operational units that can execute these operations in parallel. This can significantly speed up the computations, as well as increase energy efficiency which is covered later.</p> <p>The array can have one or multiple dimensions, each with a size. The importance of these dimensions is explained in the introduction of the memory hierarchy.</p> <a class="reference internal image-reference" href="_images/operational-array.jpg"><img alt="_images/operational-array.jpg" src="_images/operational-array.jpg" style="width: 400px;"/></a> <p>The operational array object has the following attributes:</p> <ul class=simple > <li><p><strong>operational_unit</strong>: The operational unit from which the array is built.</p> <li><p><strong>dimensions</strong>: The dimensions of the array. This should be defined as a dict, with the keys being the identifier of each dimension of the array (typically ‘D1’, ‘D2’, …) and the values being the size of this dimension (i.e. the size of the array along that dimension).</p> </ul> </section> <section id=memory-instance > <h2 id=memory-instance >Memory Instance<a class=headerlink  href="#memory-instance" title="Permalink to this heading">¶</a></h2> <p>In order to store the different activations and weights used for the computations in the operational array, different memory instances are attached in a hierarchical fashion. The instances define how big each memory is in terms of capacity and area, what the cost of writing and reading from these memories is, what its bandwidth is, and how many read/write/read-write ports it includes.</p> <a class="reference internal image-reference" href="_images/memory-instance.jpg"><img alt="_images/memory-instance.jpg" src="_images/memory-instance.jpg" style="width: 400px;"/></a> <p>The memory instance object has the following attributes:</p> <ul class=simple > <li><p><strong>name</strong>: A name for the instance</p> <li><p><strong>size</strong>: The memory size in bits.</p> <li><p><strong>r_bw/w_bw</strong>: A read or write bandwidth in the number of bits per cycle.</p> <li><p><strong>r_cost/w_cost</strong>: A read or write energy cost.</p> <li><p><strong>area</strong>: Area overhead of the instance.</p> <li><p><strong>r_port/w_port/rw_port</strong>: The number of read/write/read-write ports the instance has available.</p> <li><p><strong>latency</strong>: The latency of memory access in the number of cycles, i.e., after requiring read/write a memory address, how many cycles the memory takes to provide/receive this corresponding data. (For now, this attribute is not actively used. We assume that it is 1 to model the data prefetching behavior thanks to the deterministic dataflow.)</p> </ul> <p>(optional) * <strong>min_r_granularity/min_w_granularity</strong>: The minimal memory read/write granularity (in bit) the memory supports. This attribute is used to better model the memory that supports half-word access or a quarter-word access patterns. For example, if a memory’s read bandwidth (wordlength) is 256 bit/cycle, its read energy (r_cost) is 100, and its min_r_granularity is 128 bits (i.e., assume this memory allow half-word read), read 128 bits from it will only take 50 energy. While if min_r_granularity is not defined or is defined as 256 bits, read 128 bits from it will take 100 energy.</p> </section> <section id=memory-hierarchy > <h2 id=memory-hierarchy >Memory Hierarchy<a class=headerlink  href="#memory-hierarchy" title="Permalink to this heading">¶</a></h2> <p>Besides knowing what the specs of each memory instance are, the memory hierarchy encodes information with respect to the interconnection of the memories to the operational array, and to the other memory instances. This interconnection is achieved through multiple calls to the <cite>add_memory()</cite>, where the first call(s) adds the first level of memories, which connects to the operational array, and later calls connect to the lower memory levels. This builds a hierarchy of memories.</p> <p>To know if the memory should connect to the operational array or another lower memory level, it needs to know which data will be stored within the memories. To decouple the algorithmic side from the hardware side, this is achieved through the concept of ‘memory operands’ (as opposed to ‘algorithmic operands which are typicall the I/O activations and weights W). You can think of the memory operands as virtual operands, which will later be linked to the actual algorithmic operands in the mapping file through the <cite>memory_operand_links</cite> attribute.</p> <p>Similarly to how the operational unit can be unrolled (forming an operational array), the memories can also be unrolled, where each memory accompanies either a single operational unit or all the operational units in one or more dimensions of the operational array. This is encoded through the <cite>served_dimensions</cite> attribute, which specifies if a single memory instance of this memory level serves all operational units in that dimension. This should be a set of one-hot-encoded tuples.</p> <p>Lastly, the different read/write/read-write ports a memory instance has, are assigned to the different data movevements possible in the hierarchy. There are four types of data movements in a hierarchy: from high (<em>fh</em>), to high (<em>th</em>), from low (<em>fl</em>), to low (<em>tl</em>). At the time of writing, these can be manually linked to one of the read/write/read-write ports through the following syntax: <cite>{port_type}_port_{port_number}</cite>, <em>port_type</em> being <em>r</em>, <em>w</em> or <em>rw</em> and <em>port_number</em> equal to the port number, starting from 1, which allows to allocate multiple ports of the same type. Alternatively, these are automatically generated as a default if not probided to the <cite>add_memory()</cite> call.</p> <p>Internally, the MemoryHierarchy object extends the <a class="reference external" href="https://networkx.org/documentation/stable/reference/classes/digraph.html">NetworkX DiGraph</a> object, so its methods are available.</p> <a class="reference internal image-reference" href="_images/memory-hierarchy.jpg"><img alt="_images/memory-hierarchy.jpg" src="_images/memory-hierarchy.jpg" style="width: 800px;"/></a> <p>The memory hierarchy object includes:</p> <ul class=simple > <li><p><strong>operational_array</strong>: The operational array to which this memory hierarchy will connect. This is required to correctly infer the interconnection through the operational array’s dimensions. Through the <cite>add_memory()</cite> calls it adds a new MemoryLevel to the graph. This requires for each call a:</p> <li><p><strong>memory_instance</strong>: A MemoryInstance object you are adding to the hierarchy.</p> <li><p><strong>operands</strong>: The virtual memory operands this MemoryLevel stores.</p> <li><p><strong>port_alloc</strong>: The directionality of the memory instance’s different ports, as described above.</p> <li><p><strong>served_dimensions</strong>: The different dimensions that this memory level will serve, as described above.</p> </ul> </section> <section id=core > <h2 id=core >Core<a class=headerlink  href="#core" title="Permalink to this heading">¶</a></h2> <p>The operational array and the memory hierarchy together form a core of the accelerator.</p> <a class="reference internal image-reference" href="_images/core.jpg"><img alt="_images/core.jpg" src="_images/core.jpg" style="width: 400px;"/></a> <p>The core object includes:</p> <ul class=simple > <li><p><strong>id</strong>: The id of this core.</p> <li><p><strong>operational_array</strong>: The operational array of this core.</p> <li><p><strong>memory_hierarchy</strong>: The memory hierarchy of this core.</p> </ul> </section> <section id=hw-accelerator-model > <h2 id=hw-accelerator-model >HW Accelerator Model<a class=headerlink  href="#hw-accelerator-model" title="Permalink to this heading">¶</a></h2> <p>Multiple cores are combined together into the HW Accelerator, which is the main object modeling the HW behavior.</p> <p>The accelerator object includes:</p> <ul class=simple > <li><p><strong>name</strong>: A user-defined name for this accelerator.</p> <li><p><strong>core_set</strong>: The set of cores comprised within the accelerator.</p> <li><p><strong>global_buffer</strong>: A memory instance shared across cores. This is currently un-used.</p> </ul> </section> <section id=modelled-examples > <h2 id=modelled-examples >Modelled examples<a class=headerlink  href="#modelled-examples" title="Permalink to this heading">¶</a></h2> <p>In this repository, we have modeled 5 well-known DNN accelerators, which are Meta prototype [1], TPU [2], Edge TPU [3], Ascend [4], Tesla NPU [5], and, for our depth-first scheduling research. To make a fair and relevant comparison, we normalized all of them to have 1024 MACs and maximally 2MB global buffer (GB) but kept their spatial unrolling and local buffer settings, as shown in Table I Idx 1/3/5/7/9. Besides, we constructed a variant of every normalized architecture (by changing its on-chip memory hierarchy), denoted with ‘DF’ in the end of the name, as shown in Table I Idx 2/4/6/8/10.</p> </section> <section id=specific-settings > <h2 id=specific-settings >Specific settings<a class=headerlink  href="#specific-settings" title="Permalink to this heading">¶</a></h2> <a class="reference internal image-reference" href="https://user-images.githubusercontent.com/55059827/183848886-c85b9950-5e49-47c9-8a47-ad05062debc3.png"><img alt="Alternative text" src="https://user-images.githubusercontent.com/55059827/183848886-c85b9950-5e49-47c9-8a47-ad05062debc3.png" style="width: 800px;"/></a> <div class="admonition note"> <p class=admonition-title >Note</p> <p>K is for output channel; C is for input channel; OX and OY are the output feature map’s spatial dimensions; FX and FY are the weight’s spatial dimensions.</p> </div> </section> <section id=references > <h2 id=references >References<a class=headerlink  href="#references" title="Permalink to this heading">¶</a></h2> <p>[1] H. E. Sumbul, T. F. Wu, Y. Li, S. S. Sarwar, W. Koven, E. Murphy- Trotzky, X. Cai, E. Ansari, D. H. Morris, H. Liu, D. Kim, E. Beigne, R. Labs, and Meta, “System-level design and integration of a prototype ar/vr hardware featuring a custom low-power dnn accelerator chip in 7nm technology for codec avatars,” in 2022 IEEE Custom Integrated Circuits Conference (CICC), 2022, pp. 01–08.</p> <p>[2] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon, “In-datacenter performance analysis of a tensor processing unit,” SIGARCH Comput. Archit. News, vol. 45, no. 2, p. 1–12, jun 2017.</p> <p>[3] A. Yazdanbakhsh, K. Seshadri, B. Akin, J. Laudon, and R. Narayanaswami, “An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks,” arXiv e-prints, p. arXiv:2102.10423, Feb. 2021.</p> <p>[4] H. Liao, J. Tu, J. Xia, H. Liu, X. Zhou, H. Yuan, and Y. Hu, “Ascend: a scalable and unified architecture for ubiquitous deep neural network computing : Industry track paper,” in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 789–801.</p> <p>[5] E. Talpes, D. D. Sarma, G. Venkataramanan, P. Bannon, B. McGee, B. Floering, A. Jalote, C. Hsiong, S. Arora, A. Gorti, and G. S. Sachdev, “Compute solution for tesla’s full self-driving computer,” IEEE Micro, vol. 40, no. 2, pp. 25–35, 2020.</p> </section> </section> </article> </div> </div> </main> </div> <footer class=md-footer > <div class=md-footer-nav > <nav class="md-footer-nav__inner md-grid"> <a href=workload.html  title=Workload  class="md-flex md-footer-nav__link md-footer-nav__link--prev" rel=prev > <div class="md-flex__cell md-flex__cell--shrink"> <i class="md-icon md-icon--arrow-back md-footer-nav__button"></i> </div> <div class="md-flex__cell md-flex__cell--stretch md-footer-nav__title"> <span class=md-flex__ellipsis > <span class=md-footer-nav__direction > "Previous" </span> Workload </span> </div> </a> <a href=mapping.html  title=Mapping  class="md-flex md-footer-nav__link md-footer-nav__link--next" rel=next > <div class="md-flex__cell md-flex__cell--stretch md-footer-nav__title"><span class=md-flex__ellipsis > <span class=md-footer-nav__direction > "Next" </span> Mapping </span> </div> <div class="md-flex__cell md-flex__cell--shrink"><i class="md-icon md-icon--arrow-forward md-footer-nav__button"></i> </div> </a> </nav> </div> <div class="md-footer-meta md-typeset"> <div class="md-footer-meta__inner md-grid"> <div class=md-footer-copyright > <div class=md-footer-copyright__highlight > &#169; Copyright 2022, Arne Symons. </div> Created using <a href="http://www.sphinx-doc.org/">Sphinx</a> 7.1.2. and <a href="https://github.com/bashtage/sphinx-material/">Material for Sphinx</a> </div> </div> </div> </footer> <script src="_static/javascripts/application.js"></script> <script>app.initialize({version: "1.0.4", url: {base: ".."}})</script>
\ No newline at end of file
diff --git a/searchindex.js b/searchindex.js
index 92581123..918b4a1e 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"docnames": ["api", "code-documentation", "contribute", "future", "getting-started", "hardware", "index", "installation", "mapping", "outputs", "publications", "stages", "user-guide", "workload"], "filenames": ["api.rst", "code-documentation.rst", "contribute.rst", "future.rst", "getting-started.rst", "hardware.rst", "index.rst", "installation.rst", "mapping.rst", "outputs.rst", "publications.rst", "stages.rst", "user-guide.rst", "workload.rst"], "titles": ["ZigZag API", "Code Documentation", "Contribute", "Future changes", "Getting Started", "Hardware Architecture", "Welcome to ZigZag\u2019s documentation!", "Installing ZigZag", "Mapping", "Outputs", "Publications", "Stages", "User Guide", "Workload"], "terms": {"onc": [0, 7], "i": [0, 2, 4, 5, 6, 7, 8, 9, 10, 11, 13], "avail": [0, 2, 5, 7], "your": [0, 2, 7, 8], "python": [0, 2, 4, 7, 13], "path": [0, 4, 13], "you": [0, 2, 3, 4, 5, 7, 8, 9, 11, 13], "can": [0, 2, 3, 4, 5, 7, 8, 9, 11, 13], "import": [0, 2, 4, 5, 13], "ani": [0, 2, 7, 11], "file": [0, 2, 4, 5, 7, 8, 9, 11, 13], "from": [0, 5, 11, 13], "thi": [0, 1, 2, 4, 5, 6, 8, 9, 11, 13], "function": [0, 2, 7, 11], "take": [0, 3, 7, 13], "an": [0, 2, 4, 5, 7, 9, 11], "workload": [0, 4, 6, 8, 11, 12], "hardwar": [0, 4, 6, 8, 11, 12, 13], "architectur": [0, 4, 6, 10, 11, 12], "map": [0, 3, 4, 5, 6, 9, 12], "return": [0, 9, 11], "perform": [0, 5, 10], "execut": [0, 2, 4, 5, 8, 11, 13], "model": [0, 3, 4, 6], "": [0, 2, 3, 4, 5, 8, 10, 11], "layer": [0, 4, 6, 8, 9, 11], "under": [0, 2, 4, 8], "given": [0, 4, 11], "constraint": [0, 4], "energi": [0, 3, 4, 5, 9, 10, 11], "latenc": [0, 3, 4, 5, 6, 9, 11], "cme": [0, 11], "acceler": [0, 4, 6, 8, 9, 10, 11, 13], "opt": 0, "dump_filename_pattern": [0, 4], "output": [0, 4, 5, 6, 11, 12, 13], "datetim": [0, 11], "json": [0, 9, 11], "pickle_filenam": 0, "list_of_cm": 0, "pickl": [0, 11], "The": [0, 1, 2, 4, 5, 6, 8, 9, 12, 13], "input": [0, 3, 4, 5, 8, 9, 13], "ar": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13], "A": [0, 5, 6, 8, 10, 13], "neural": [0, 5, 10], "network": [0, 4, 5, 10], "defin": [0, 4, 5, 9, 10, 11, 13], "onnx": [0, 3, 4, 8, 11], "format": [0, 1, 2, 9], "own": [0, 11, 13], "high": [0, 5, 10], "level": [0, 5, 11], "hw": [0, 3, 4, 6, 8, 11], "descript": [0, 8, 11], "specifi": [0, 5, 9], "core": [0, 4, 6, 8, 11, 13], "alloc": [0, 5, 8, 10, 11], "spatial": [0, 3, 5, 8, 9, 10, 13], "option": 0, "tempor": [0, 3, 4, 6, 9], "order": [0, 2, 3, 5, 10, 11], "memori": [0, 3, 6, 8, 9, 10, 11, 13], "operand": [0, 5, 8, 13], "link": [0, 1, 5, 8, 13], "optim": [0, 4, 6, 10], "target": 0, "It": [0, 4, 9, 11, 13], "edp": [0, 11], "delai": 0, "product": 0, "name": [0, 4, 5, 8, 13], "result": [0, 6, 9], "which": [0, 4, 5, 8, 9, 11, 13], "includ": [0, 2, 3, 5, 9], "all": [0, 2, 4, 5, 9, 11, 13], "detail": [0, 2, 6, 7, 11], "metadata": 0, "analys": 0, "debug": 0, "number": [0, 5, 9], "indic": [0, 13], "overal": 0, "consum": 0, "run": [0, 2, 6, 7, 8, 11], "user": [0, 3, 4, 5, 6, 7, 11], "wai": [0, 1, 2, 4, 5, 9, 11, 13], "cycl": [0, 5], "count": 0, "collect": 0, "cost": [0, 3, 4, 5, 6, 13], "evalu": [0, 5, 11], "stand": 0, "we": [0, 2, 3, 4, 5, 11], "demonstr": 0, "how": [0, 2, 4, 5, 7, 8, 9, 11], "us": [0, 1, 2, 3, 4, 5, 7, 8, 9, 11, 13], "multipl": [0, 5, 6, 10, 11, 13], "demo": 0, "comment": [1, 2], "within": [1, 5, 8, 11, 13], "sourc": [1, 2], "zigzag": [1, 4, 5, 8, 9, 11, 12, 13], "framework": [1, 2, 4, 6, 7, 8, 10, 11, 12, 13], "support": [1, 3, 4, 6, 11], "auto": [1, 10], "doxygen": 1, "automat": [1, 4, 5, 6, 8, 11, 13], "updat": [1, 2, 3, 11], "soon": 1, "somebodi": 1, "push": 1, "someth": 1, "master": 1, "branch": 1, "github": [1, 7, 10], "repo": [1, 13], "project": [1, 6, 10], "follow": [1, 2, 4, 5, 8, 9, 11, 12, 13], "access": [1, 3, 5], "latest": 1, "version": [1, 6, 11], "when": [2, 8, 11, 13], "pleas": [2, 9, 11], "consid": 2, "googl": 2, "style": 2, "guid": [2, 3, 6], "docstr": 2, "class": [2, 3, 11], "method": [2, 5, 9], "exampl": [2, 4, 8, 9, 11, 13], "found": [2, 11, 13], "throughout": 2, "here": [2, 3, 4, 8, 10, 13], "accordingli": 2, "In": [2, 4, 5, 6, 9, 11, 13], "packag": [2, 6], "call": [2, 5], "bumpver": 2, "twine": 2, "These": [2, 11], "instal": [2, 6], "pip": [2, 7], "first": [2, 5, 6, 11], "pull": 2, "make": [2, 3, 5, 7, 9], "sure": 2, "have": [2, 4, 5, 11, 13], "remot": 2, "cahng": 2, "merg": 2, "conflict": 2, "chang": [2, 5, 6, 11], "commit": 2, "Then": [2, 13], "command": [2, 4, 13], "patch": 2, "m": [2, 5, 10], "upload": 2, "dist": 2, "zigzag_ds": 2, "x": [2, 5], "y": [2, 5], "z": [2, 5], "whl": 2, "dse": [2, 6, 7], "tar": 2, "gz": 2, "provid": [2, 4, 5, 6, 7, 8, 11, 12, 13], "sever": 2, "differ": [2, 4, 5, 6, 9, 11, 13], "There": [2, 5, 9], "mani": [2, 5], "public": [2, 6], "relat": 2, "page": [2, 6], "allow": [2, 5, 8], "everyon": 2, "get": [2, 6, 7], "familiar": 2, "more": [2, 3, 4, 5, 7, 11, 12, 13], "about": [2, 9, 11], "implement": 2, "ad": [2, 5, 8], "mandatori": 2, "what": [2, 5, 8, 9, 11], "doe": 2, "achiev": [2, 5], "newli": 2, "explicit": 2, "resid": [2, 11], "doc": 2, "folder": [2, 11], "restructuredtext": 2, "rst": 2, "decid": 2, "would": [2, 13], "best": [2, 11], "fit": 2, "exist": [2, 11], "one": [2, 4, 5, 11], "If": [2, 7, 9, 11, 13], "creat": [2, 8], "lower": [2, 5], "case": [2, 3, 6, 13], "letter": [2, 13], "hyphen": 2, "between": [2, 4, 6, 13], "word": 2, "after": [2, 7, 11], "need": [2, 4, 5, 8, 13], "add": [2, 3, 5, 7, 13], "toctre": 2, "index": [2, 6], "same": [2, 3, 5], "webpag": 2, "sphinx": 2, "should": [2, 5, 8, 9, 13], "both": [2, 3], "press": 2, "theme": 2, "easi": [2, 9], "through": [2, 4, 5, 6, 7, 8, 10, 11, 13], "requir": [2, 4, 5, 7, 8, 9, 11, 13], "txt": [2, 7], "cd": 2, "r": [2, 5, 7], "simpli": [2, 11], "b": [2, 5, 13], "html": 2, "entri": [2, 8], "point": [2, 6], "guidlin": 2, "paramet": [2, 5, 11], "constructor": 2, "download": 2, "describ": [2, 5, 13], "successfulli": 2, "configur": [2, 10], "done": 2, "either": [2, 5], "gui": 2, "conf": 2, "find": [3, 4, 6, 11], "plan": 3, "oper": [3, 8, 11], "ancestor": 3, "layernod": [3, 13], "dummynod": [3, 13], "fix": 3, "loop": [3, 10, 11, 13], "multi": [3, 6], "dimension": 3, "unrol": [3, 5], "fraction": 3, "account": [3, 11, 13], "bandwidth": [3, 5], "loma": [3, 4, 10, 11], "memoryalloc": 3, "besid": [3, 4, 5, 11], "capac": [3, 5], "lpf": 3, "limit": [3, 10], "visualis": 3, "tutori": 3, "remak": 3, "tabl": [3, 5], "without": 3, "df": [3, 5], "stage": [3, 4, 6, 9, 12], "stack": 3, "combin": [3, 5, 8, 9, 11], "common": 3, "versatil": 4, "tool": 4, "estim": [4, 6], "dl": [4, 6], "design": [4, 5, 6, 11], "multitud": 4, "set": [4, 11], "As": [4, 11], "step": [4, 11], "nn": [4, 5, 13], "onto": [4, 6, 8], "go": 4, "alexnet": 4, "ha": [4, 5, 11, 12, 13], "been": 4, "shape": 4, "infer": [4, 5], "mean": 4, "tensor": [4, 5, 13], "intermedi": [4, 5, 13], "inform": [4, 5, 8, 9, 11, 12, 13], "know": [4, 5, 8, 9, 13], "correctli": [4, 5, 13], "tpu": [4, 5], "like": [4, 11, 13], "tpu_lik": 4, "py": [4, 8, 11, 13], "must": [4, 11], "suggest": 4, "resourc": [4, 6, 8], "alexnet_on_tpu_lik": 4, "gener": [4, 5, 6, 8, 9, 11, 12], "ran": 4, "main": [4, 5, 7, 9, 13], "pars": [4, 11, 13], "contain": [4, 8, 13], "program": 4, "flow": [4, 11], "document": [4, 7, 11, 12], "main_onnx": [4, 13], "note": [4, 9], "construct": [4, 5], "becaus": 4, "object": [4, 5, 9, 11, 13], "respect": [4, 5, 9], "modul": [4, 6], "other": [4, 5, 11, 13], "also": [4, 5, 7, 8, 9, 11, 13], "see": [4, 9, 13], "section": [4, 5, 9, 11], "manual": [4, 5, 6, 8, 11], "definit": [4, 8, 9, 11], "resnet18": [4, 8, 13], "salsa": [4, 11], "search": [4, 6], "engin": [4, 6, 11], "util": [4, 9], "schedul": [4, 5, 6, 11], "than": 4, "main_onnx_salsa": 4, "dure": 4, "save": [4, 9], "depend": [4, 7, 13], "total": [4, 11], "five": [4, 12], "each": [4, 5, 9, 11, 13], "node": [4, 8, 9, 11], "onnxmodelparserstag": [4, 8, 11, 13], "wa": 4, "minimallatencystag": [4, 11], "refer": [4, 13], "introduc": 5, "concept": [5, 11], "well": 5, "known": 5, "start": [5, 6, 7, 11], "smallest": 5, "build": [5, 12, 13], "block": [5, 12], "work": [5, 9], "our": [5, 11], "up": [5, 11], "toward": [5, 10], "summat": 5, "accumul": 5, "across": [5, 10, 11], "data": [5, 9], "activ": 5, "train": 5, "weight": [5, 13], "typic": [5, 8], "multipli": 5, "two": [5, 9], "element": [5, 11], "attribut": [5, 9, 13], "input_precis": 5, "list": [5, 11, 13], "precis": [5, 13], "bit": [5, 13], "2": 5, "output_precis": 5, "energy_cost": 5, "singl": [5, 11], "area": [5, 10], "overhead": 5, "inferenc": 5, "million": 5, "speed": 5, "significantli": 5, "comput": [5, 6, 8, 10, 13], "increas": 5, "effici": 5, "cover": 5, "later": [5, 11], "dimens": [5, 11, 13], "size": [5, 13], "explain": [5, 9, 11], "introduct": 5, "operational_unit": 5, "built": 5, "dict": 5, "kei": [5, 8], "being": [5, 11], "identifi": 5, "d1": 5, "d2": 5, "valu": [5, 11, 13], "e": [5, 8, 10, 11, 13], "along": 5, "store": 5, "attach": 5, "hierarch": 5, "fashion": 5, "big": 5, "term": 5, "write": [5, 8], "read": [5, 13], "port": 5, "r_bw": 5, "w_bw": 5, "per": 5, "r_cost": 5, "w_cost": 5, "r_port": 5, "w_port": 5, "rw_port": 5, "spec": [5, 7], "encod": [5, 8], "interconnect": [5, 11], "add_memori": 5, "where": [5, 11, 13], "connect": [5, 11], "To": [5, 11], "anoth": [5, 11], "decoupl": 5, "algorithm": [5, 6, 8, 10, 13], "side": [5, 13], "oppos": 5, "typical": 5, "o": [5, 8, 13], "w": [5, 8, 10], "think": [5, 11], "virtual": [5, 13], "actual": [5, 13], "memory_operand_link": [5, 8, 13], "similarli": 5, "form": 5, "accompani": 5, "served_dimens": 5, "serv": [5, 11], "hot": 5, "tupl": [5, 11], "lastli": 5, "assign": 5, "movev": 5, "possibl": [5, 13], "four": 5, "type": [5, 12, 13], "movement": 5, "fh": 5, "th": 5, "low": 5, "fl": 5, "tl": 5, "At": 5, "time": [5, 8], "syntax": 5, "port_typ": 5, "_port_": 5, "port_numb": 5, "rw": 5, "equal": 5, "1": [5, 10], "altern": [5, 7, 13], "default": [5, 8], "probid": 5, "intern": [5, 7, 10, 11], "memoryhierarchi": 5, "extend": 5, "networkx": 5, "digraph": 5, "so": [5, 11, 13], "its": [5, 7, 9, 11], "operational_arrai": 5, "new": [5, 6, 11], "memorylevel": 5, "graph": [5, 11, 13], "memory_inst": 5, "memoryinst": 5, "port_alloc": 5, "direction": 5, "abov": 5, "togeth": [5, 13], "id": [5, 8, 13], "memory_hierarchi": 5, "behaviour": [5, 11], "core_set": 5, "compris": 5, "global_buff": 5, "share": 5, "current": [5, 9], "un": 5, "repositori": [5, 7], "5": 5, "dnn": [5, 10], "meta": 5, "prototyp": 5, "edg": [5, 13], "3": [5, 7], "ascend": 5, "4": [5, 10], "tesla": 5, "npu": 5, "depth": [5, 6], "research": 5, "fair": 5, "relev": [5, 9], "comparison": 5, "normal": 5, "them": [5, 11], "1024": [5, 13], "mac": 5, "maxim": 5, "2mb": 5, "global": 5, "buffer": 5, "gb": 5, "kept": 5, "local": 5, "shown": 5, "idx": 5, "7": 5, "9": 5, "variant": 5, "everi": [5, 8], "chip": 5, "denot": 5, "end": [5, 7, 11], "6": [5, 10, 11], "8": [5, 7, 10], "10": [5, 10], "k": [5, 10, 13], "channel": [5, 13], "c": [5, 13], "ox": [5, 13], "oi": [5, 13], "featur": 5, "fx": [5, 13], "fy": [5, 13], "h": [5, 10], "sumbul": [5, 10], "t": [5, 8, 10, 13], "f": 5, "wu": [5, 10], "li": 5, "sarwar": 5, "koven": 5, "murphi": 5, "trotzki": 5, "cai": 5, "ansari": 5, "d": [5, 10], "morri": 5, "liu": [5, 10], "kim": 5, "beign": [5, 10], "lab": 5, "system": [5, 10, 11], "integr": [5, 10], "vr": 5, "custom": [5, 7, 8, 13], "power": 5, "7nm": 5, "technologi": 5, "codec": 5, "avatar": 5, "2022": [5, 10], "ieee": [5, 10], "circuit": [5, 10], "confer": [5, 10], "cicc": 5, "pp": [5, 10], "01": 5, "08": 5, "n": [5, 10], "p": [5, 10], "jouppi": 5, "young": 5, "patil": 5, "patterson": 5, "g": [5, 8, 11], "agraw": 5, "bajwa": 5, "bate": 5, "bhatia": 5, "boden": 5, "borcher": 5, "boyl": 5, "l": [5, 10], "cantin": 5, "chao": 5, "clark": 5, "j": 5, "coriel": 5, "dalei": 5, "dau": 5, "dean": 5, "gelb": 5, "v": [5, 10], "ghaemmaghami": 5, "gottipati": 5, "gulland": 5, "hagmann": 5, "ho": 5, "hogberg": 5, "hu": 5, "hundt": 5, "hurt": 5, "ibarz": 5, "jaffei": 5, "jaworski": 5, "kaplan": 5, "khaitan": 5, "killebrew": 5, "koch": 5, "kumar": 5, "laci": 5, "laudon": 5, "law": 5, "le": 5, "leari": 5, "luck": 5, "lundin": 5, "mackean": 5, "maggior": 5, "mahoni": 5, "miller": 5, "nagarajan": 5, "narayanaswami": 5, "ni": 5, "nix": 5, "norri": 5, "omernick": 5, "penukonda": 5, "phelp": 5, "ross": 5, "salek": 5, "samadiani": 5, "severn": 5, "sizikov": 5, "snelham": 5, "souter": 5, "steinberg": 5, "swing": 5, "tan": 5, "thorson": 5, "tian": 5, "toma": 5, "tuttl": 5, "vasudevan": 5, "walter": 5, "wang": 5, "wilcox": 5, "yoon": 5, "datacent": 5, "analysi": 5, "process": [5, 11], "sigarch": 5, "archit": 5, "vol": [5, 10], "45": 5, "12": 5, "jun": 5, "2017": 5, "yazdanbakhsh": 5, "seshadri": 5, "akin": 5, "convolut": [5, 8, 13], "arxiv": [5, 10], "print": [5, 10], "2102": 5, "10423": 5, "feb": 5, "2021": [5, 10], "liao": 5, "tu": 5, "xia": 5, "zhou": 5, "yuan": 5, "scalabl": 5, "unifi": 5, "ubiquit": 5, "deep": [5, 6, 10], "industri": 5, "track": 5, "paper": [5, 10, 13], "symposium": [5, 10], "hpca": [5, 10], "789": 5, "801": 5, "talp": 5, "sarma": 5, "venkataramanan": 5, "bannon": 5, "mcgee": 5, "floer": 5, "jalot": 5, "hsiong": 5, "arora": 5, "gorti": 5, "sachdev": 5, "solut": 5, "full": 5, "self": [5, 9], "drive": 5, "micro": 5, "40": 5, "25": 5, "35": 5, "2020": [5, 10], "space": [6, 11], "explor": [6, 11], "learn": 6, "bridg": 6, "gap": 6, "decis": 6, "special": 6, "fast": [6, 10], "accur": 6, "analyt": [6, 10], "crucial": 6, "part": [6, 8], "clone": 6, "analyz": [6, 10], "api": [6, 7], "get_hardware_performance_zigzag": 6, "futur": 6, "contribut": [6, 13], "guidelin": [6, 13], "upgrad": 6, "develop": 6, "idea": 6, "explan": 6, "studi": 6, "extens": 6, "cross": 6, "fuse": 6, "code": 6, "re": 7, "interest": [7, 11], "modif": [7, 9], "directli": 7, "venv": 7, "conda": 7, "environ": 7, "look": [7, 11], "want": [7, 8, 11, 13], "git": 7, "com": 7, "kuleuven": 7, "mica": 7, "http": 7, "anaconda": 7, "argument": [7, 11], "autom": [8, 10], "some": [8, 11, 13], "aspect": [8, 9, 11], "interfac": 8, "core_alloc": [8, 13], "spatial_map": [8, 9, 13], "parallel": [8, 13], "strategi": [8, 13], "spatialmappinggeneratorstag": [8, 11, 13], "hierarchi": [8, 9, 11], "extra": [8, 11], "flexibl": 8, "scheme": 8, "don": 8, "put": 8, "safe": 8, "bet": 8, "copi": [8, 11], "exact": 8, "detect": 8, "dictionari": [8, 13], "interpret": 9, "predefin": 9, "costmodelevalu": [9, 11], "knowledg": 9, "irrelev": 9, "handl": 9, "complexhandl": 9, "insid": [9, 11, 13], "represent": [9, 11], "invok": 9, "pass": 9, "__simplejsonrepr__": 9, "convert": [9, 11, 13], "onli": [9, 11, 13], "off": [9, 10], "load": [9, 13], "reli": 9, "def": 9, "simpl": [9, 11], "energy_tot": 9, "latency_total2": 9, "standard": 9, "filename_pattern": [9, 11], "lose": 9, "etc": [9, 11], "concern": 9, "__jsonrepr__": 9, "temporal_map": 9, "mem_utili_shar": 9, "word_access": 9, "memory_word_access": 9, "operational_energi": 9, "mac_energi": 9, "memory_energi": 9, "mem_energi": 9, "energy_breakdown_per_level": 9, "energy_breakdown": 9, "energy_breakdown_per_level_per_operand": 9, "energy_breakdown_furth": 9, "latency_without_onloading_without_offload": 9, "latency_total0": 9, "latency_with_onloading_without_offload": 9, "latency_total1": 9, "latency_with_onloading_with_offload": 9, "goal": [9, 11], "straightforward": 9, "care": 9, "certain": 9, "modifi": [9, 11], "parser": 9, "pointer": 10, "mei": 10, "houshmand": 10, "jain": 10, "giraldo": 10, "verhelst": 10, "enlarg": 10, "joint": 10, "transact": 10, "70": 10, "1160": 10, "1174": 10, "aug": 10, "doi": 10, "1109": 10, "tc": 10, "3059962": 10, "uniform": 10, "divers": 10, "dataflow": 10, "test": 10, "europ": 10, "exhibit": 10, "date": 10, "antwerp": 10, "belgium": 10, "220": 10, "225": 10, "23919": 10, "date54114": 10, "9774728": 10, "slide": 10, "video": 10, "symon": 10, "base": [10, 11], "3rd": 10, "artifici": 10, "intellig": 10, "aica": 10, "washington": 10, "dc": 10, "usa": 10, "aicas51828": 10, "9458493": 10, "coseman": 10, "papista": 10, "bhattacharje": 10, "deback": 10, "mallik": 10, "verkest": 10, "opportun": 10, "emerg": 10, "analog": 10, "electron": 10, "devic": 10, "meet": 10, "iedm": 10, "san": 10, "francisco": 10, "ca": 10, "29": 10, "iedm13553": 10, "9372006": 10, "accuraci": 10, "trade": 10, "contemporari": 10, "9458553": 10, "colleman": 10, "verelst": 10, "tuytelaar": 10, "processor": 10, "dynam": 10, "ifip": 10, "29th": 10, "veri": 10, "larg": [10, 13], "scale": 10, "vlsi": 10, "soc": 10, "singapor": 10, "soc53125": 10, "9607013": 10, "zhu": 10, "sun": 10, "mobil": 10, "transform": 10, "4th": 10, "incheon": 10, "korea": 10, "republ": 10, "142": 10, "145": 10, "aicas54282": 10, "9869945": 10, "goetschalckx": 10, "enabl": 10, "2023": 10, "karl": 10, "heterogen": 10, "exploit": 10, "fine": 10, "grain": 10, "48550": 10, "2212": 10, "10612": 10, "fasfou": 10, "genet": 10, "date56975": 10, "10137070": 10, "modularli": 11, "easili": 11, "adapt": 11, "sequenc": 11, "determin": 11, "mainstag": 11, "initi": 11, "acceleratorparserstag": 11, "simplesavestag": 11, "receiv": 11, "workloadstag": 11, "sm": 11, "minim": 11, "lomastag": 11, "tm": 11, "costmodelstag": 11, "accelerator_path": 11, "arg": 11, "onnx_model_path": 11, "mapping_path": 11, "pattern": 11, "loma_lpf_limit": 11, "loma_show_progress_bar": 11, "true": [11, 13], "show": 11, "progress": 11, "bar": 11, "while": 11, "over": 11, "correspond": 11, "similar": 11, "those": 11, "pipelin": [11, 13], "remain": 11, "said": 11, "further": 11, "label": 11, "below": 11, "fed": 11, "far": 11, "discuss": 11, "last": 11, "revers": 11, "hold": 11, "finish": 11, "conbim": 11, "yield": 11, "chain": 11, "manipul": 11, "invoc": 11, "lowest": 11, "still": 11, "For": [11, 13], "miss": 11, "__init__": 11, "workloadparserstag": 11, "workload_path": 11, "generalparameteriteratorstag": 11, "whose": 11, "predetermin": 11, "plottemporalmappingsstag": 11, "substag": 11, "keep": 11, "minimalenergystag": 11, "list_of_cal": 11, "minimaledpstag": 11, "sumstag": 11, "sum": 11, "listifystag": 11, "instead": [11, 13], "removeextrainfostag": 11, "strip": 11, "info": 11, "subcal": 11, "cachebeforeyieldstag": 11, "cach": 11, "break": 11, "top": 11, "bottom": 11, "skipifdumpexistsstag": 11, "check": 11, "alreadi": 11, "skip": 11, "multiprocessingspawnstag": 11, "multiprocess": 11, "multiprocessinggatherstag": 11, "completesavestag": 11, "picklesavestag": 11, "dumpstag": 11, "salsastag": 11, "simul": 11, "anneal": 11, "temporalorderingconversionstag": 11, "spatialmappingconversionstag": 11, "auser": 11, "arrai": 11, "present": [11, 13], "inner": 11, "most": 11, "config": 11, "let": 11, "sai": 11, "metric": 11, "easiest": 11, "accord": 11, "intend": 11, "guarante": 11, "correct": 11, "taken": 11, "inherit": 11, "abstract": 11, "callabl": 11, "kwarg": 11, "second": 11, "extra_info": 11, "reduct": 11, "statement": 11, "outsid": 11, "happen": 11, "regard": 12, "major": 12, "compon": 12, "recommend": 13, "context": 13, "ml": 13, "often": 13, "recogn": 13, "complet": 13, "conv": 13, "qlinearconv": 13, "matmul": 13, "gemm": 13, "assum": 13, "accelerat": 13, "incur": 13, "zero": 13, "feel": 13, "free": 13, "open": 13, "issu": 13, "yourself": 13, "rather": 13, "avoid": 13, "origin": 13, "discard": 13, "doesn": 13, "do": 13, "onnx_model": 13, "modelproto": 13, "my_model_with_internal_data": 13, "save_model": 13, "save_as_external_data": 13, "all_tensors_to_one_fil": 13, "locat": 13, "external_data_filenam": 13, "size_threshold": 13, "convert_attribut": 13, "fals": 13, "raw": 13, "specif": 13, "directori": 13, "shape_infer": 13, "my_model": 13, "inferred_model": 13, "infer_shap": 13, "my_inferred_model": 13, "moreov": 13, "repres": 13, "equat": 13, "small": 13, "wherea": 13, "alwai": 13, "freeli": 13, "dimension_rel": 13, "relationship": 13, "stride": 13, "filter": 13, "dilat": 13, "rate": 13, "loop_dim_s": 13, "left": 13, "hand": 13, "operand_precis": 13, "partial": 13, "o_fin": 13, "final": 13, "operand_sourc": 13, "come": 13, "constant_operand": 13, "constant": 13, "prior": 13, "none": 13, "readm": 13, "notat": 13, "batch": 13, "row": 13, "column": 13, "kernel": 13}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"zigzag": [0, 2, 6, 7, 10], "api": 0, "get_hardware_performance_zigzag": 0, "code": [1, 2], "document": [1, 2, 3, 6], "contribut": 2, "guidelin": 2, "upgrad": 2, "project": 2, "version": 2, "develop": 2, "write": 2, "new": [2, 10], "part": 2, "gener": [2, 10], "build": 2, "local": 2, "which": 2, "support": [2, 10, 13], "doxygen": 2, "futur": 3, "chang": 3, "framework": 3, "get": 4, "start": 4, "first": [4, 10], "run": 4, "analyz": 4, "result": [4, 11], "hardwar": 5, "architectur": 5, "oper": [5, 13], "unit": 5, "arrai": 5, "memori": 5, "instanc": 5, "hierarchi": 5, "core": [5, 10], "hw": 5, "acceler": 5, "model": [5, 10, 11, 13], "exampl": 5, "specif": 5, "set": 5, "refer": 5, "welcom": 6, "": [6, 13], "content": 6, "indic": 6, "tabl": 6, "instal": 7, "packag": 7, "manual": [7, 13], "clone": 7, "prerequisit": 7, "map": [8, 10, 11], "user": [8, 12], "defin": 8, "constraint": 8, "output": 9, "simplesavestag": 9, "completesavestag": 9, "creat": [9, 11], "custom": [9, 11], "savestag": 9, "public": 10, "The": [10, 11], "idea": 10, "detail": 10, "latenc": 10, "explan": 10, "tempor": [10, 11], "search": 10, "engin": 10, "differ": 10, "design": 10, "space": 10, "explor": 10, "case": 10, "studi": 10, "extens": 10, "cross": 10, "layer": [10, 13], "depth": 10, "schedul": 10, "multi": 10, "fuse": 10, "stage": 11, "introduct": 11, "main": 11, "entri": 11, "point": 11, "sequenti": 11, "call": 11, "back": 11, "pass": 11, "implement": 11, "input": 11, "parser": 11, "iter": 11, "plot": 11, "reduc": 11, "optim": 11, "save": [11, 13], "dump": 11, "spatial": 11, "cost": 11, "your": [11, 13], "guid": 12, "workload": 13, "onnx": 13, "extern": 13, "data": 13, "infer": 13, "an": 13, "shape": 13, "definit": 13}, "envversion": {"sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 58}, "alltitles": {"ZigZag API": [[0, "zigzag-api"]], "get_hardware_performance_zigzag()": [[0, "get-hardware-performance-zigzag"]], "Code Documentation": [[1, "code-documentation"]], "Contribute": [[2, "contribute"]], "Contributing guidelines": [[2, "contributing-guidelines"]], "Upgrading the project version (for ZigZag developers)": [[2, "upgrading-the-project-version-for-zigzag-developers"]], "Documentation": [[2, "documentation"], [3, "documentation"]], "Writing new parts for the general documentation": [[2, "writing-new-parts-for-the-general-documentation"]], "Building the general documentation locally": [[2, "building-the-general-documentation-locally"]], "Writing code which supports the code documentation with Doxygen": [[2, "writing-code-which-supports-the-code-documentation-with-doxygen"]], "Building the code documentation locally": [[2, "building-the-code-documentation-locally"]], "Future changes": [[3, "future-changes"]], "Framework": [[3, "framework"]], "Getting Started": [[4, "getting-started"]], "First run": [[4, "first-run"]], "Analyzing results": [[4, "analyzing-results"]], "Hardware Architecture": [[5, "hardware-architecture"]], "Operational Unit": [[5, "operational-unit"]], "Operational Array": [[5, "operational-array"]], "Memory Instance": [[5, "memory-instance"]], "Memory Hierarchy": [[5, "memory-hierarchy"]], "Core": [[5, "core"]], "HW Accelerator Model": [[5, "hw-accelerator-model"]], "Modelled examples": [[5, "modelled-examples"]], "Specific settings": [[5, "specific-settings"]], "References": [[5, "references"]], "Welcome to ZigZag\u2019s documentation!": [[6, "welcome-to-zigzag-s-documentation"]], "Contents:": [[6, null]], "Indices and tables": [[6, "indices-and-tables"]], "Installing ZigZag": [[7, "installing-zigzag"]], "Installing as a package": [[7, "installing-as-a-package"]], "Manual clone": [[7, "manual-clone"]], "Prerequisites": [[7, "prerequisites"]], "Installation": [[7, "installation"]], "Mapping": [[8, "mapping"]], "User-defined mapping constraints": [[8, "user-defined-mapping-constraints"]], "Outputs": [[9, "outputs"]], "SimpleSaveStage": [[9, "simplesavestage"]], "CompleteSaveStage": [[9, "completesavestage"]], "Creating a custom SaveStage": [[9, "creating-a-custom-savestage"]], "Publications": [[10, "publications"]], "The general idea of ZigZag": [[10, "the-general-idea-of-zigzag"]], "Detailed latency model explanation": [[10, "detailed-latency-model-explanation"]], "The new temporal mapping search engine": [[10, "the-new-temporal-mapping-search-engine"]], "Different design space exploration case studies": [[10, "different-design-space-exploration-case-studies"]], "Extension to support cross-layer depth-first scheduling": [[10, "extension-to-support-cross-layer-depth-first-scheduling"]], "Extension to support multi-core layer-fused scheduling": [[10, "extension-to-support-multi-core-layer-fused-scheduling"]], "Stages": [[11, "stages"]], "Introduction": [[11, "introduction"]], "The main entry point": [[11, "the-main-entry-point"]], "The sequential call of stages": [[11, "the-sequential-call-of-stages"]], "The back passing of results": [[11, "the-back-passing-of-results"]], "Implemented stages": [[11, "implemented-stages"]], "Input parser stages": [[11, "input-parser-stages"]], "Iterator stage": [[11, "iterator-stage"]], "Plot stages": [[11, "plot-stages"]], "Reduce stages": [[11, "reduce-stages"]], "Optimization stages": [[11, "optimization-stages"]], "Save and dump stages": [[11, "save-and-dump-stages"]], "Temporal mapping stages": [[11, "temporal-mapping-stages"]], "Spatial mapping stages": [[11, "spatial-mapping-stages"]], "Cost model stages": [[11, "cost-model-stages"]], "Creating your custom stage": [[11, "creating-your-custom-stage"]], "User Guide": [[12, "user-guide"]], "Workload": [[13, "workload"]], "Onnx models": [[13, "onnx-models"]], "Supported onnx operators": [[13, "supported-onnx-operators"]], "Saving your onnx model with external data": [[13, "saving-your-onnx-model-with-external-data"]], "Inferring an onnx model\u2019s shapes": [[13, "inferring-an-onnx-model-s-shapes"]], "Manual layer definition": [[13, "manual-layer-definition"]]}, "indexentries": {}})
\ No newline at end of file
+Search.setIndex({"docnames": ["api", "code-documentation", "contribute", "future", "getting-started", "hardware", "index", "installation", "mapping", "outputs", "publications", "stages", "user-guide", "workload"], "filenames": ["api.rst", "code-documentation.rst", "contribute.rst", "future.rst", "getting-started.rst", "hardware.rst", "index.rst", "installation.rst", "mapping.rst", "outputs.rst", "publications.rst", "stages.rst", "user-guide.rst", "workload.rst"], "titles": ["ZigZag API", "Code Documentation", "Contribute", "Future changes", "Getting Started", "Hardware Architecture", "Welcome to ZigZag\u2019s documentation!", "Installing ZigZag", "Mapping", "Outputs", "Publications", "Stages", "User Guide", "Workload"], "terms": {"onc": [0, 7], "i": [0, 2, 4, 5, 6, 7, 8, 9, 10, 11, 13], "avail": [0, 2, 5, 7], "your": [0, 2, 7, 8], "python": [0, 2, 4, 7, 13], "path": [0, 4, 13], "you": [0, 2, 3, 4, 5, 7, 8, 9, 11, 13], "can": [0, 2, 3, 4, 5, 7, 8, 9, 11, 13], "import": [0, 2, 4, 5, 13], "ani": [0, 2, 7, 11], "file": [0, 2, 4, 5, 7, 8, 9, 11, 13], "from": [0, 5, 11, 13], "thi": [0, 1, 2, 4, 5, 6, 8, 9, 11, 13], "function": [0, 2, 7, 11], "take": [0, 3, 5, 7, 13], "an": [0, 2, 4, 5, 7, 9, 11], "workload": [0, 4, 6, 8, 11, 12], "hardwar": [0, 4, 6, 8, 11, 12, 13], "architectur": [0, 4, 6, 10, 11, 12], "map": [0, 3, 4, 5, 6, 9, 12], "return": [0, 9, 11], "perform": [0, 5, 10], "execut": [0, 2, 4, 5, 8, 11, 13], "model": [0, 3, 4, 6], "": [0, 2, 3, 4, 5, 8, 10, 11], "layer": [0, 4, 6, 8, 9, 11], "under": [0, 2, 4, 8], "given": [0, 4, 11], "constraint": [0, 4], "energi": [0, 3, 4, 5, 9, 10, 11], "latenc": [0, 3, 4, 5, 6, 9, 11], "cme": [0, 11], "acceler": [0, 4, 6, 8, 9, 10, 11, 13], "opt": 0, "dump_filename_pattern": [0, 4], "output": [0, 4, 5, 6, 11, 12, 13], "datetim": [0, 11], "json": [0, 9, 11], "pickle_filenam": 0, "list_of_cm": 0, "pickl": [0, 11], "The": [0, 1, 2, 4, 5, 6, 8, 9, 12, 13], "input": [0, 3, 4, 5, 8, 9, 13], "ar": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13], "A": [0, 5, 6, 8, 10, 13], "neural": [0, 5, 10], "network": [0, 4, 5, 10], "defin": [0, 4, 5, 9, 10, 11, 13], "onnx": [0, 3, 4, 8, 11], "format": [0, 1, 2, 9], "own": [0, 11, 13], "high": [0, 5, 10], "level": [0, 5, 11], "hw": [0, 3, 4, 6, 8, 11], "descript": [0, 8, 11], "specifi": [0, 5, 9], "core": [0, 4, 6, 8, 11, 13], "alloc": [0, 5, 8, 10, 11], "spatial": [0, 3, 5, 8, 9, 10, 13], "option": [0, 5], "tempor": [0, 3, 4, 6, 9], "order": [0, 2, 3, 5, 10, 11], "memori": [0, 3, 6, 8, 9, 10, 11, 13], "operand": [0, 5, 8, 13], "link": [0, 1, 5, 8, 13], "optim": [0, 4, 6, 10], "target": 0, "It": [0, 4, 9, 11, 13], "edp": [0, 11], "delai": 0, "product": 0, "name": [0, 4, 5, 8, 13], "result": [0, 6, 9], "which": [0, 4, 5, 8, 9, 11, 13], "includ": [0, 2, 3, 5, 9], "all": [0, 2, 4, 5, 9, 11, 13], "detail": [0, 2, 6, 7, 11], "metadata": 0, "analys": 0, "debug": 0, "number": [0, 5, 9], "indic": [0, 13], "overal": 0, "consum": 0, "run": [0, 2, 6, 7, 8, 11], "user": [0, 3, 4, 5, 6, 7, 11], "wai": [0, 1, 2, 4, 5, 9, 11, 13], "cycl": [0, 5], "count": 0, "collect": 0, "cost": [0, 3, 4, 5, 6, 13], "evalu": [0, 5, 11], "stand": 0, "we": [0, 2, 3, 4, 5, 11], "demonstr": 0, "how": [0, 2, 4, 5, 7, 8, 9, 11], "us": [0, 1, 2, 3, 4, 5, 7, 8, 9, 11, 13], "multipl": [0, 5, 6, 10, 11, 13], "demo": 0, "comment": [1, 2], "within": [1, 5, 8, 11, 13], "sourc": [1, 2], "zigzag": [1, 4, 5, 8, 9, 11, 12, 13], "framework": [1, 2, 4, 6, 7, 8, 10, 11, 12, 13], "support": [1, 3, 4, 5, 6, 11], "auto": [1, 10], "doxygen": 1, "automat": [1, 4, 5, 6, 8, 11, 13], "updat": [1, 2, 3, 11], "soon": 1, "somebodi": 1, "push": 1, "someth": 1, "master": 1, "branch": 1, "github": [1, 7, 10], "repo": [1, 13], "project": [1, 6, 10], "follow": [1, 2, 4, 5, 8, 9, 11, 12, 13], "access": [1, 3, 5], "latest": 1, "version": [1, 6, 11], "when": [2, 8, 11, 13], "pleas": [2, 9, 11], "consid": 2, "googl": 2, "style": 2, "guid": [2, 3, 6], "docstr": 2, "class": [2, 3, 11], "method": [2, 5, 9], "exampl": [2, 4, 8, 9, 11, 13], "found": [2, 11, 13], "throughout": 2, "here": [2, 3, 4, 8, 10, 13], "accordingli": 2, "In": [2, 4, 5, 6, 9, 11, 13], "packag": [2, 6], "call": [2, 5], "bumpver": 2, "twine": 2, "These": [2, 11], "instal": [2, 6], "pip": [2, 7], "first": [2, 5, 6, 11], "pull": 2, "make": [2, 3, 5, 7, 9], "sure": 2, "have": [2, 4, 5, 11, 13], "remot": 2, "cahng": 2, "merg": 2, "conflict": 2, "chang": [2, 5, 6, 11], "commit": 2, "Then": [2, 13], "command": [2, 4, 13], "patch": 2, "m": [2, 5, 10], "upload": 2, "dist": 2, "zigzag_ds": 2, "x": [2, 5], "y": [2, 5], "z": [2, 5], "whl": 2, "dse": [2, 6, 7], "tar": 2, "gz": 2, "provid": [2, 4, 5, 6, 7, 8, 11, 12, 13], "sever": 2, "differ": [2, 4, 5, 6, 9, 11, 13], "There": [2, 5, 9], "mani": [2, 5], "public": [2, 6], "relat": 2, "page": [2, 6], "allow": [2, 5, 8], "everyon": 2, "get": [2, 6, 7], "familiar": 2, "more": [2, 3, 4, 5, 7, 11, 12, 13], "about": [2, 9, 11], "implement": 2, "ad": [2, 5, 8], "mandatori": 2, "what": [2, 5, 8, 9, 11], "doe": 2, "achiev": [2, 5], "newli": 2, "explicit": 2, "resid": [2, 11], "doc": 2, "folder": [2, 11], "restructuredtext": 2, "rst": 2, "decid": 2, "would": [2, 13], "best": [2, 11], "fit": 2, "exist": [2, 11], "one": [2, 4, 5, 11], "If": [2, 7, 9, 11, 13], "creat": [2, 8], "lower": [2, 5], "case": [2, 3, 6, 13], "letter": [2, 13], "hyphen": 2, "between": [2, 4, 6, 13], "word": [2, 5], "after": [2, 5, 7, 11], "need": [2, 4, 5, 8, 13], "add": [2, 3, 5, 7, 13], "toctre": 2, "index": [2, 6], "same": [2, 3, 5], "webpag": 2, "sphinx": 2, "should": [2, 5, 8, 9, 13], "both": [2, 3], "press": 2, "theme": 2, "easi": [2, 9], "through": [2, 4, 5, 6, 7, 8, 10, 11, 13], "requir": [2, 4, 5, 7, 8, 9, 11, 13], "txt": [2, 7], "cd": 2, "r": [2, 5, 7], "simpli": [2, 11], "b": [2, 5, 13], "html": 2, "entri": [2, 8], "point": [2, 6], "guidlin": 2, "paramet": [2, 5, 11], "constructor": 2, "download": 2, "describ": [2, 5, 13], "successfulli": 2, "configur": [2, 10], "done": 2, "either": [2, 5], "gui": 2, "conf": 2, "find": [3, 4, 6, 11], "plan": 3, "oper": [3, 8, 11], "ancestor": 3, "layernod": [3, 13], "dummynod": [3, 13], "fix": 3, "loop": [3, 10, 11, 13], "multi": [3, 6], "dimension": 3, "unrol": [3, 5], "fraction": 3, "account": [3, 11, 13], "bandwidth": [3, 5], "loma": [3, 4, 10, 11], "memoryalloc": 3, "besid": [3, 4, 5, 11], "capac": [3, 5], "lpf": 3, "limit": [3, 10], "visualis": 3, "tutori": 3, "remak": 3, "tabl": [3, 5], "without": 3, "df": [3, 5], "stage": [3, 4, 6, 9, 12], "stack": 3, "combin": [3, 5, 8, 9, 11], "common": 3, "versatil": 4, "tool": 4, "estim": [4, 6], "dl": [4, 6], "design": [4, 5, 6, 11], "multitud": 4, "set": [4, 11], "As": [4, 11], "step": [4, 11], "nn": [4, 5, 13], "onto": [4, 6, 8], "go": 4, "alexnet": 4, "ha": [4, 5, 11, 12, 13], "been": 4, "shape": 4, "infer": [4, 5], "mean": 4, "tensor": [4, 5, 13], "intermedi": [4, 5, 13], "inform": [4, 5, 8, 9, 11, 12, 13], "know": [4, 5, 8, 9, 13], "correctli": [4, 5, 13], "tpu": [4, 5], "like": [4, 11, 13], "tpu_lik": 4, "py": [4, 8, 11, 13], "must": [4, 11], "suggest": 4, "resourc": [4, 6, 8], "alexnet_on_tpu_lik": 4, "gener": [4, 5, 6, 8, 9, 11, 12], "ran": 4, "main": [4, 5, 7, 9, 13], "pars": [4, 11, 13], "contain": [4, 8, 13], "program": 4, "flow": [4, 11], "document": [4, 7, 11, 12], "main_onnx": [4, 13], "note": [4, 9], "construct": [4, 5], "becaus": 4, "object": [4, 5, 9, 11, 13], "respect": [4, 5, 9], "modul": [4, 6], "other": [4, 5, 11, 13], "also": [4, 5, 7, 8, 9, 11, 13], "see": [4, 9, 13], "section": [4, 5, 9, 11], "manual": [4, 5, 6, 8, 11], "definit": [4, 8, 9, 11], "resnet18": [4, 8, 13], "salsa": [4, 11], "search": [4, 6], "engin": [4, 6, 11], "util": [4, 9], "schedul": [4, 5, 6, 11], "than": 4, "main_onnx_salsa": 4, "dure": 4, "save": [4, 9], "depend": [4, 7, 13], "total": [4, 11], "five": [4, 12], "each": [4, 5, 9, 11, 13], "node": [4, 8, 9, 11], "onnxmodelparserstag": [4, 8, 11, 13], "wa": 4, "minimallatencystag": [4, 11], "refer": [4, 13], "introduc": 5, "concept": [5, 11], "well": 5, "known": 5, "start": [5, 6, 7, 11], "smallest": 5, "build": [5, 12, 13], "block": [5, 12], "work": [5, 9], "our": [5, 11], "up": [5, 11], "toward": [5, 10], "summat": 5, "accumul": 5, "across": [5, 10, 11], "data": [5, 9], "activ": 5, "train": 5, "weight": [5, 13], "typic": [5, 8], "multipli": 5, "two": [5, 9], "element": [5, 11], "attribut": [5, 9, 13], "input_precis": 5, "list": [5, 11, 13], "precis": [5, 13], "bit": [5, 13], "output_precis": 5, "energy_cost": 5, "singl": [5, 11], "e": [5, 8, 10, 11, 13], "g": [5, 8, 11], "area": [5, 10], "overhead": 5, "inferenc": 5, "million": 5, "parallel": [5, 8, 13], "significantli": 5, "speed": 5, "comput": [5, 6, 8, 10, 13], "increas": 5, "effici": 5, "cover": 5, "later": [5, 11], "dimens": [5, 11, 13], "size": [5, 13], "explain": [5, 9, 11], "introduct": 5, "operational_unit": 5, "built": 5, "dict": 5, "kei": [5, 8], "being": [5, 11], "identifi": 5, "d1": 5, "d2": 5, "valu": [5, 11, 13], "along": 5, "store": 5, "attach": 5, "hierarch": 5, "fashion": 5, "big": 5, "term": 5, "write": [5, 8], "read": [5, 13], "its": [5, 7, 9, 11], "port": 5, "r_bw": 5, "w_bw": 5, "per": 5, "r_cost": 5, "w_cost": 5, "r_port": 5, "w_port": 5, "rw_port": 5, "address": 5, "receiv": [5, 11], "correspond": [5, 11], "For": [5, 11, 13], "now": 5, "assum": [5, 13], "1": [5, 10], "prefetch": 5, "behavior": 5, "thank": 5, "determinist": 5, "dataflow": [5, 10], "min_r_granular": 5, "min_w_granular": 5, "minim": [5, 11], "granular": 5, "better": 5, "half": 5, "quarter": 5, "pattern": [5, 11], "wordlength": 5, "256": 5, "100": 5, "128": 5, "onli": [5, 9, 11, 13], "50": 5, "while": [5, 11], "spec": [5, 7], "encod": [5, 8], "interconnect": [5, 11], "add_memori": 5, "where": [5, 11, 13], "connect": [5, 11], "To": [5, 11], "anoth": [5, 11], "decoupl": 5, "algorithm": [5, 6, 8, 10, 13], "side": [5, 13], "oppos": 5, "typical": 5, "o": [5, 8, 13], "w": [5, 8, 10], "think": [5, 11], "virtual": [5, 13], "actual": [5, 13], "memory_operand_link": [5, 8, 13], "similarli": 5, "form": 5, "accompani": 5, "served_dimens": 5, "serv": [5, 11], "hot": 5, "tupl": [5, 11], "lastli": 5, "assign": 5, "movev": 5, "possibl": [5, 13], "four": 5, "type": [5, 12, 13], "movement": 5, "fh": 5, "th": 5, "low": 5, "fl": 5, "tl": 5, "At": 5, "time": [5, 8], "syntax": 5, "port_typ": 5, "_port_": 5, "port_numb": 5, "rw": 5, "equal": 5, "altern": [5, 7, 13], "default": [5, 8], "probid": 5, "intern": [5, 7, 10, 11], "memoryhierarchi": 5, "extend": 5, "networkx": 5, "digraph": 5, "so": [5, 11, 13], "operational_arrai": 5, "new": [5, 6, 11], "memorylevel": 5, "graph": [5, 11, 13], "memory_inst": 5, "memoryinst": 5, "port_alloc": 5, "direction": 5, "abov": 5, "togeth": [5, 13], "id": [5, 8, 13], "memory_hierarchi": 5, "core_set": 5, "compris": 5, "global_buff": 5, "share": 5, "current": [5, 9], "un": 5, "repositori": [5, 7], "5": 5, "dnn": [5, 10], "meta": 5, "prototyp": 5, "2": 5, "edg": [5, 13], "3": [5, 7], "ascend": 5, "4": [5, 10], "tesla": 5, "npu": 5, "depth": [5, 6], "research": 5, "fair": 5, "relev": [5, 9], "comparison": 5, "normal": 5, "them": [5, 11], "1024": [5, 13], "mac": 5, "maxim": 5, "2mb": 5, "global": 5, "buffer": 5, "gb": 5, "kept": 5, "local": 5, "shown": 5, "idx": 5, "7": 5, "9": 5, "variant": 5, "everi": [5, 8], "chip": 5, "denot": 5, "end": [5, 7, 11], "6": [5, 10, 11], "8": [5, 7, 10], "10": [5, 10], "k": [5, 10, 13], "channel": [5, 13], "c": [5, 13], "ox": [5, 13], "oi": [5, 13], "featur": 5, "fx": [5, 13], "fy": [5, 13], "h": [5, 10], "sumbul": [5, 10], "t": [5, 8, 10, 13], "f": 5, "wu": [5, 10], "li": 5, "sarwar": 5, "koven": 5, "murphi": 5, "trotzki": 5, "cai": 5, "ansari": 5, "d": [5, 10], "morri": 5, "liu": [5, 10], "kim": 5, "beign": [5, 10], "lab": 5, "system": [5, 10, 11], "integr": [5, 10], "vr": 5, "custom": [5, 7, 8, 13], "power": 5, "7nm": 5, "technologi": 5, "codec": 5, "avatar": 5, "2022": [5, 10], "ieee": [5, 10], "circuit": [5, 10], "confer": [5, 10], "cicc": 5, "pp": [5, 10], "01": 5, "08": 5, "n": [5, 10], "p": [5, 10], "jouppi": 5, "young": 5, "patil": 5, "patterson": 5, "agraw": 5, "bajwa": 5, "bate": 5, "bhatia": 5, "boden": 5, "borcher": 5, "boyl": 5, "l": [5, 10], "cantin": 5, "chao": 5, "clark": 5, "j": 5, "coriel": 5, "dalei": 5, "dau": 5, "dean": 5, "gelb": 5, "v": [5, 10], "ghaemmaghami": 5, "gottipati": 5, "gulland": 5, "hagmann": 5, "ho": 5, "hogberg": 5, "hu": 5, "hundt": 5, "hurt": 5, "ibarz": 5, "jaffei": 5, "jaworski": 5, "kaplan": 5, "khaitan": 5, "killebrew": 5, "koch": 5, "kumar": 5, "laci": 5, "laudon": 5, "law": 5, "le": 5, "leari": 5, "luck": 5, "lundin": 5, "mackean": 5, "maggior": 5, "mahoni": 5, "miller": 5, "nagarajan": 5, "narayanaswami": 5, "ni": 5, "nix": 5, "norri": 5, "omernick": 5, "penukonda": 5, "phelp": 5, "ross": 5, "salek": 5, "samadiani": 5, "severn": 5, "sizikov": 5, "snelham": 5, "souter": 5, "steinberg": 5, "swing": 5, "tan": 5, "thorson": 5, "tian": 5, "toma": 5, "tuttl": 5, "vasudevan": 5, "walter": 5, "wang": 5, "wilcox": 5, "yoon": 5, "datacent": 5, "analysi": 5, "process": [5, 11], "sigarch": 5, "archit": 5, "vol": [5, 10], "45": 5, "12": 5, "jun": 5, "2017": 5, "yazdanbakhsh": 5, "seshadri": 5, "akin": 5, "convolut": [5, 8, 13], "arxiv": [5, 10], "print": [5, 10], "2102": 5, "10423": 5, "feb": 5, "2021": [5, 10], "liao": 5, "tu": 5, "xia": 5, "zhou": 5, "yuan": 5, "scalabl": 5, "unifi": 5, "ubiquit": 5, "deep": [5, 6, 10], "industri": 5, "track": 5, "paper": [5, 10, 13], "symposium": [5, 10], "hpca": [5, 10], "789": 5, "801": 5, "talp": 5, "sarma": 5, "venkataramanan": 5, "bannon": 5, "mcgee": 5, "floer": 5, "jalot": 5, "hsiong": 5, "arora": 5, "gorti": 5, "sachdev": 5, "solut": 5, "full": 5, "self": [5, 9], "drive": 5, "micro": 5, "40": 5, "25": 5, "35": 5, "2020": [5, 10], "space": [6, 11], "explor": [6, 11], "learn": 6, "bridg": 6, "gap": 6, "decis": 6, "special": 6, "fast": [6, 10], "accur": 6, "analyt": [6, 10], "crucial": 6, "part": [6, 8], "clone": 6, "analyz": [6, 10], "api": [6, 7], "get_hardware_performance_zigzag": 6, "futur": 6, "contribut": [6, 13], "guidelin": [6, 13], "upgrad": 6, "develop": 6, "idea": 6, "explan": 6, "studi": 6, "extens": 6, "cross": 6, "fuse": 6, "code": 6, "re": 7, "interest": [7, 11], "modif": [7, 9], "directli": 7, "venv": 7, "conda": 7, "environ": 7, "look": [7, 11], "want": [7, 8, 11, 13], "git": 7, "com": 7, "kuleuven": 7, "mica": 7, "http": 7, "anaconda": 7, "argument": [7, 11], "autom": [8, 10], "some": [8, 11, 13], "aspect": [8, 9, 11], "interfac": 8, "core_alloc": [8, 13], "spatial_map": [8, 9, 13], "strategi": [8, 13], "spatialmappinggeneratorstag": [8, 11, 13], "hierarchi": [8, 9, 11], "extra": [8, 11], "flexibl": 8, "scheme": 8, "don": 8, "put": 8, "safe": 8, "bet": 8, "copi": [8, 11], "exact": 8, "detect": 8, "dictionari": [8, 13], "interpret": 9, "predefin": 9, "costmodelevalu": [9, 11], "knowledg": 9, "irrelev": 9, "handl": 9, "complexhandl": 9, "insid": [9, 11, 13], "represent": [9, 11], "invok": 9, "pass": 9, "__simplejsonrepr__": 9, "convert": [9, 11, 13], "off": [9, 10], "load": [9, 13], "reli": 9, "def": 9, "simpl": [9, 11], "energy_tot": 9, "latency_total2": 9, "standard": 9, "filename_pattern": [9, 11], "lose": 9, "etc": [9, 11], "concern": 9, "__jsonrepr__": 9, "temporal_map": 9, "mem_utili_shar": 9, "word_access": 9, "memory_word_access": 9, "operational_energi": 9, "mac_energi": 9, "memory_energi": 9, "mem_energi": 9, "energy_breakdown_per_level": 9, "energy_breakdown": 9, "energy_breakdown_per_level_per_operand": 9, "energy_breakdown_furth": 9, "latency_without_onloading_without_offload": 9, "latency_total0": 9, "latency_with_onloading_without_offload": 9, "latency_total1": 9, "latency_with_onloading_with_offload": 9, "goal": [9, 11], "straightforward": 9, "care": 9, "certain": 9, "modifi": [9, 11], "parser": 9, "pointer": 10, "mei": 10, "houshmand": 10, "jain": 10, "giraldo": 10, "verhelst": 10, "enlarg": 10, "joint": 10, "transact": 10, "70": 10, "1160": 10, "1174": 10, "aug": 10, "doi": 10, "1109": 10, "tc": 10, "3059962": 10, "uniform": 10, "divers": 10, "test": 10, "europ": 10, "exhibit": 10, "date": 10, "antwerp": 10, "belgium": 10, "220": 10, "225": 10, "23919": 10, "date54114": 10, "9774728": 10, "slide": 10, "video": 10, "symon": 10, "base": [10, 11], "3rd": 10, "artifici": 10, "intellig": 10, "aica": 10, "washington": 10, "dc": 10, "usa": 10, "aicas51828": 10, "9458493": 10, "coseman": 10, "papista": 10, "bhattacharje": 10, "deback": 10, "mallik": 10, "verkest": 10, "opportun": 10, "emerg": 10, "analog": 10, "electron": 10, "devic": 10, "meet": 10, "iedm": 10, "san": 10, "francisco": 10, "ca": 10, "29": 10, "iedm13553": 10, "9372006": 10, "accuraci": 10, "trade": 10, "contemporari": 10, "9458553": 10, "colleman": 10, "verelst": 10, "tuytelaar": 10, "processor": 10, "dynam": 10, "ifip": 10, "29th": 10, "veri": 10, "larg": [10, 13], "scale": 10, "vlsi": 10, "soc": 10, "singapor": 10, "soc53125": 10, "9607013": 10, "zhu": 10, "sun": 10, "mobil": 10, "transform": 10, "4th": 10, "incheon": 10, "korea": 10, "republ": 10, "142": 10, "145": 10, "aicas54282": 10, "9869945": 10, "goetschalckx": 10, "enabl": 10, "2023": 10, "karl": 10, "heterogen": 10, "exploit": 10, "fine": 10, "grain": 10, "48550": 10, "2212": 10, "10612": 10, "fasfou": 10, "genet": 10, "date56975": 10, "10137070": 10, "modularli": 11, "easili": 11, "adapt": 11, "sequenc": 11, "determin": 11, "mainstag": 11, "initi": 11, "acceleratorparserstag": 11, "simplesavestag": 11, "workloadstag": 11, "sm": 11, "lomastag": 11, "tm": 11, "costmodelstag": 11, "accelerator_path": 11, "arg": 11, "onnx_model_path": 11, "mapping_path": 11, "loma_lpf_limit": 11, "loma_show_progress_bar": 11, "true": [11, 13], "show": 11, "progress": 11, "bar": 11, "over": 11, "similar": 11, "those": 11, "pipelin": [11, 13], "remain": 11, "said": 11, "further": 11, "label": 11, "below": 11, "fed": 11, "far": 11, "discuss": 11, "last": 11, "revers": 11, "hold": 11, "finish": 11, "conbim": 11, "yield": 11, "chain": 11, "manipul": 11, "invoc": 11, "lowest": 11, "still": 11, "miss": 11, "__init__": 11, "workloadparserstag": 11, "workload_path": 11, "generalparameteriteratorstag": 11, "whose": 11, "predetermin": 11, "plottemporalmappingsstag": 11, "substag": 11, "keep": 11, "minimalenergystag": 11, "list_of_cal": 11, "minimaledpstag": 11, "sumstag": 11, "sum": 11, "listifystag": 11, "instead": [11, 13], "removeextrainfostag": 11, "strip": 11, "info": 11, "subcal": 11, "cachebeforeyieldstag": 11, "cach": 11, "break": 11, "top": 11, "bottom": 11, "skipifdumpexistsstag": 11, "check": 11, "alreadi": 11, "skip": 11, "multiprocessingspawnstag": 11, "multiprocess": 11, "multiprocessinggatherstag": 11, "completesavestag": 11, "picklesavestag": 11, "dumpstag": 11, "salsastag": 11, "simul": 11, "anneal": 11, "temporalorderingconversionstag": 11, "spatialmappingconversionstag": 11, "auser": 11, "arrai": 11, "present": [11, 13], "inner": 11, "most": 11, "config": 11, "let": 11, "sai": 11, "metric": 11, "easiest": 11, "accord": 11, "intend": 11, "behaviour": 11, "guarante": 11, "correct": 11, "taken": 11, "inherit": 11, "abstract": 11, "callabl": 11, "kwarg": 11, "second": 11, "extra_info": 11, "reduct": 11, "statement": 11, "outsid": 11, "happen": 11, "regard": 12, "major": 12, "compon": 12, "recommend": 13, "context": 13, "ml": 13, "often": 13, "recogn": 13, "complet": 13, "conv": 13, "qlinearconv": 13, "matmul": 13, "gemm": 13, "accelerat": 13, "incur": 13, "zero": 13, "feel": 13, "free": 13, "open": 13, "issu": 13, "yourself": 13, "rather": 13, "avoid": 13, "origin": 13, "discard": 13, "doesn": 13, "do": 13, "onnx_model": 13, "modelproto": 13, "my_model_with_internal_data": 13, "save_model": 13, "save_as_external_data": 13, "all_tensors_to_one_fil": 13, "locat": 13, "external_data_filenam": 13, "size_threshold": 13, "convert_attribut": 13, "fals": 13, "raw": 13, "specif": 13, "directori": 13, "shape_infer": 13, "my_model": 13, "inferred_model": 13, "infer_shap": 13, "my_inferred_model": 13, "moreov": 13, "repres": 13, "equat": 13, "small": 13, "wherea": 13, "alwai": 13, "freeli": 13, "dimension_rel": 13, "relationship": 13, "stride": 13, "filter": 13, "dilat": 13, "rate": 13, "loop_dim_s": 13, "left": 13, "hand": 13, "operand_precis": 13, "partial": 13, "o_fin": 13, "final": 13, "operand_sourc": 13, "come": 13, "constant_operand": 13, "constant": 13, "prior": 13, "none": 13, "readm": 13, "notat": 13, "batch": 13, "row": 13, "column": 13, "kernel": 13}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"zigzag": [0, 2, 6, 7, 10], "api": 0, "get_hardware_performance_zigzag": 0, "code": [1, 2], "document": [1, 2, 3, 6], "contribut": 2, "guidelin": 2, "upgrad": 2, "project": 2, "version": 2, "develop": 2, "write": 2, "new": [2, 10], "part": 2, "gener": [2, 10], "build": 2, "local": 2, "which": 2, "support": [2, 10, 13], "doxygen": 2, "futur": 3, "chang": 3, "framework": 3, "get": 4, "start": 4, "first": [4, 10], "run": 4, "analyz": 4, "result": [4, 11], "hardwar": 5, "architectur": 5, "oper": [5, 13], "unit": 5, "arrai": 5, "memori": 5, "instanc": 5, "hierarchi": 5, "core": [5, 10], "hw": 5, "acceler": 5, "model": [5, 10, 11, 13], "exampl": 5, "specif": 5, "set": 5, "refer": 5, "welcom": 6, "": [6, 13], "content": 6, "indic": 6, "tabl": 6, "instal": 7, "packag": 7, "manual": [7, 13], "clone": 7, "prerequisit": 7, "map": [8, 10, 11], "user": [8, 12], "defin": 8, "constraint": 8, "output": 9, "simplesavestag": 9, "completesavestag": 9, "creat": [9, 11], "custom": [9, 11], "savestag": 9, "public": 10, "The": [10, 11], "idea": 10, "detail": 10, "latenc": 10, "explan": 10, "tempor": [10, 11], "search": 10, "engin": 10, "differ": 10, "design": 10, "space": 10, "explor": 10, "case": 10, "studi": 10, "extens": 10, "cross": 10, "layer": [10, 13], "depth": 10, "schedul": 10, "multi": 10, "fuse": 10, "stage": 11, "introduct": 11, "main": 11, "entri": 11, "point": 11, "sequenti": 11, "call": 11, "back": 11, "pass": 11, "implement": 11, "input": 11, "parser": 11, "iter": 11, "plot": 11, "reduc": 11, "optim": 11, "save": [11, 13], "dump": 11, "spatial": 11, "cost": 11, "your": [11, 13], "guid": 12, "workload": 13, "onnx": 13, "extern": 13, "data": 13, "infer": 13, "an": 13, "shape": 13, "definit": 13}, "envversion": {"sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 58}, "alltitles": {"ZigZag API": [[0, "zigzag-api"]], "get_hardware_performance_zigzag()": [[0, "get-hardware-performance-zigzag"]], "Code Documentation": [[1, "code-documentation"]], "Contribute": [[2, "contribute"]], "Contributing guidelines": [[2, "contributing-guidelines"]], "Upgrading the project version (for ZigZag developers)": [[2, "upgrading-the-project-version-for-zigzag-developers"]], "Documentation": [[2, "documentation"], [3, "documentation"]], "Writing new parts for the general documentation": [[2, "writing-new-parts-for-the-general-documentation"]], "Building the general documentation locally": [[2, "building-the-general-documentation-locally"]], "Writing code which supports the code documentation with Doxygen": [[2, "writing-code-which-supports-the-code-documentation-with-doxygen"]], "Building the code documentation locally": [[2, "building-the-code-documentation-locally"]], "Future changes": [[3, "future-changes"]], "Framework": [[3, "framework"]], "Getting Started": [[4, "getting-started"]], "First run": [[4, "first-run"]], "Analyzing results": [[4, "analyzing-results"]], "Hardware Architecture": [[5, "hardware-architecture"]], "Operational Unit": [[5, "operational-unit"]], "Operational Array": [[5, "operational-array"]], "Memory Instance": [[5, "memory-instance"]], "Memory Hierarchy": [[5, "memory-hierarchy"]], "Core": [[5, "core"]], "HW Accelerator Model": [[5, "hw-accelerator-model"]], "Modelled examples": [[5, "modelled-examples"]], "Specific settings": [[5, "specific-settings"]], "References": [[5, "references"]], "Welcome to ZigZag\u2019s documentation!": [[6, "welcome-to-zigzag-s-documentation"]], "Contents:": [[6, null]], "Indices and tables": [[6, "indices-and-tables"]], "Installing ZigZag": [[7, "installing-zigzag"]], "Installing as a package": [[7, "installing-as-a-package"]], "Manual clone": [[7, "manual-clone"]], "Prerequisites": [[7, "prerequisites"]], "Installation": [[7, "installation"]], "Mapping": [[8, "mapping"]], "User-defined mapping constraints": [[8, "user-defined-mapping-constraints"]], "Outputs": [[9, "outputs"]], "SimpleSaveStage": [[9, "simplesavestage"]], "CompleteSaveStage": [[9, "completesavestage"]], "Creating a custom SaveStage": [[9, "creating-a-custom-savestage"]], "Publications": [[10, "publications"]], "The general idea of ZigZag": [[10, "the-general-idea-of-zigzag"]], "Detailed latency model explanation": [[10, "detailed-latency-model-explanation"]], "The new temporal mapping search engine": [[10, "the-new-temporal-mapping-search-engine"]], "Different design space exploration case studies": [[10, "different-design-space-exploration-case-studies"]], "Extension to support cross-layer depth-first scheduling": [[10, "extension-to-support-cross-layer-depth-first-scheduling"]], "Extension to support multi-core layer-fused scheduling": [[10, "extension-to-support-multi-core-layer-fused-scheduling"]], "Stages": [[11, "stages"]], "Introduction": [[11, "introduction"]], "The main entry point": [[11, "the-main-entry-point"]], "The sequential call of stages": [[11, "the-sequential-call-of-stages"]], "The back passing of results": [[11, "the-back-passing-of-results"]], "Implemented stages": [[11, "implemented-stages"]], "Input parser stages": [[11, "input-parser-stages"]], "Iterator stage": [[11, "iterator-stage"]], "Plot stages": [[11, "plot-stages"]], "Reduce stages": [[11, "reduce-stages"]], "Optimization stages": [[11, "optimization-stages"]], "Save and dump stages": [[11, "save-and-dump-stages"]], "Temporal mapping stages": [[11, "temporal-mapping-stages"]], "Spatial mapping stages": [[11, "spatial-mapping-stages"]], "Cost model stages": [[11, "cost-model-stages"]], "Creating your custom stage": [[11, "creating-your-custom-stage"]], "User Guide": [[12, "user-guide"]], "Workload": [[13, "workload"]], "Onnx models": [[13, "onnx-models"]], "Supported onnx operators": [[13, "supported-onnx-operators"]], "Saving your onnx model with external data": [[13, "saving-your-onnx-model-with-external-data"]], "Inferring an onnx model\u2019s shapes": [[13, "inferring-an-onnx-model-s-shapes"]], "Manual layer definition": [[13, "manual-layer-definition"]]}, "indexentries": {}})
\ No newline at end of file