Skip to content

Commit

Permalink
Deploying to gh-pages from @ 2c07997 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
LY-Mei committed Oct 11, 2023
1 parent 17c6b11 commit bb529bf
Show file tree
Hide file tree
Showing 5 changed files with 24 additions and 11 deletions.
Binary file modified .doctrees/environment.pickle
Binary file not shown.
Binary file modified .doctrees/hardware.doctree
Binary file not shown.
31 changes: 22 additions & 9 deletions _sources/hardware.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Accelerating inference of a NN requires the execution of multiplications and sum
The operational unit object has the following attributes:

* **input_precision**: List of input operand (data) precision in the number of bits for each input operand (typically there are two input operands for a Multiplier).
* **output_precision**: The bit precision of the operation's output.
* **output_precision**: The bit precision of the operation's output (e.g., for a multiplier, the output_precision is auto-set to be the sum of two input operands' precision).
* **energy_cost**: Energy of executing a single operation (e.g., a multiplication).
* **area**: The HW area overhead of a single operational unit (e.g., a multiplier).

Expand All @@ -31,20 +31,20 @@ The array can have one or multiple dimensions, each with a size. The importance
The operational array object has the following attributes:

* **operational_unit**: The operational unit from which the array is built.
* **dimensions**: The dimensions of the array. This should be defined as a dict, with the keys being the identifier of each dimension of the array (typically 'D1', 'D2', ...) and the values being the size of this dimension (i.e. the size of the array along that dimension).
* **dimensions**: The dimensions of the array. This should be defined as a Python dictionary, with the keys being the identifier of each dimension of the array (typically 'D1', 'D2', ...) and the values being the size of this dimension (i.e. the size of the array along that dimension).


Memory Instance
---------------

In order to store the different activations and weights used for the computations in the operational array, different memory instances are attached in a hierarchical fashion. The instances define how big each memory is in terms of capacity and area, what the cost of writing and reading from these memories is, what its bandwidth is, and how many read/write/read-write ports it includes.
In order to store different activations and weights used for the computations in the operational array, different memory instances are attached in a hierarchical fashion. The instances define how big each memory is in terms of capacity and area, what the cost of writing and reading from these memories is, what its bandwidth is, and how many read/write/read-write ports it includes.

.. image:: images/hardware-architecture/memory-instance.jpg
:width: 400

The memory instance object has the following attributes:

* **name**: A name for the instance
* **name**: A name for the instance.
* **size**: The memory size in bits.
* **r_bw/w_bw**: A read or write bandwidth in the number of bits per cycle.
* **r_cost/w_cost**: A read or write energy cost.
Expand All @@ -54,19 +54,32 @@ The memory instance object has the following attributes:

(optional)

* **min_r_granularity/min_w_granularity**: The minimal memory read/write granularity (in bit) the memory supports. This attribute is used to better model the memory that supports half-word access or quarter-word access patterns. For example, if a memory's read bandwidth (wordlength) is 256 bit/cycle, its read energy (r_cost) is 100, and its min_r_granularity is 128 bits (i.e., assume this memory allow half-word read), read 128 bits from it (we approximatlly assume that) will only take 50 energy. While if min_r_granularity is not defined or is defined as 256 bits, read 128 bits from it will take 100 energy.
* **min_r_granularity/min_w_granularity**: The minimal memory read/write granularity (in bit) the memory supports. This attribute is used to better model the memory that supports half-word access or quarter-word access patterns. For example, if a memory's read bandwidth (wordlength) is 256 bit/cycle, its read energy (r_cost) is 100, and its min_r_granularity is 128 bits (i.e., assume this memory allow half-word read), read 128 bits from it (we approximatlly assume that) will only take 50 energy. If min_r_granularity is not defined (or is defined as 256 bits), read 128 bits from it will take 100 energy.

Memory Hierarchy
----------------

Besides knowing what the specs of each memory instance are, the memory hierarchy encodes information with respect to the interconnection of the memories to the operational array, and to the other memory instances.
This interconnection is achieved through multiple calls to the `add_memory()`, where the first call(s) adds the first level of memories, which connects to the operational array, and later calls connect to the lower memory levels. This builds a hierarchy of memories.
Besides knowing what the specs of each memory instance are, the memory hierarchy encodes information with respect to the interconnection of each memory to the operational array, and to other memory instances.
This interconnection is achieved through multiple calls to the `add_memory()`, where the first call(s) adds the first level of memories, which connects to the operational array, and later calls connect the higher level of memories to the lower levels'. This builds a hierarchy of memories.

To know if the memory should connect to the operational array or another lower memory level, it needs to know which data will be stored within the memories. To decouple the algorithmic side from the hardware side, this is achieved through the concept of 'memory operands' (as opposed to 'algorithmic operands which are typicall the I/O activations and weights W). You can think of the memory operands as virtual operands, which will later be linked to the actual algorithmic operands in the mapping file through the `memory_operand_links` attribute.
To know if the memory should connect to the operational array or another lower memory level, it needs to know which data will be stored within the memories. To decouple the algorithmic side from the hardware side, this is achieved through the concept of 'memory operands' (as opposed to 'algorithmic operands which are typically the Input/Output activations and weights W). You can think of the memory operands as virtual operands, which will later be linked to the actual algorithmic operands in the mapping file through the `memory_operand_links` attribute.

Similarly to how the operational unit can be unrolled (forming an operational array), the memories can also be unrolled, where each memory accompanies either a single operational unit or all the operational units in one or more dimensions of the operational array. This is encoded through the `served_dimensions` attribute, which specifies if a single memory instance of this memory level serves all operational units in that dimension. This should be a set of one-hot-encoded tuples.

Lastly, the different read/write/read-write ports a memory instance has, are assigned to the different data movevements possible in the hierarchy. There are four types of data movements in a hierarchy: from high (*fh*), to high (*th*), from low (*fl*), to low (*tl*). At the time of writing, these can be manually linked to one of the read/write/read-write ports through the following syntax: `{port_type}_port_{port_number}`, *port_type* being *r*, *w* or *rw* and *port_number* equal to the port number, starting from 1, which allows to allocate multiple ports of the same type. Alternatively, these are automatically generated as a default if not probided to the `add_memory()` call.
For example, assuming an operational array has 2 dimensions: {D1:3, D2:4}. There are four common `served_dimensions` settings for a memory level:
1. "None" or {(0, 0)}: the memory does not serve any array dimensions, meaning the memory is unrolled with each operational unit, i.e., there are, in total 12 such memory instances.
2. {(1, 0)}: the memory serves array dimension D1, meaning the memory is unrolled with D2, and each memory instance serves all 3 operational units along D1, i.e., there are, in total 4 such memory instances.
3. {(0, 1)}: the memory serves array dimension D2, meaning the memory is unrolled with D1, and each memory instance serves all 4 operational units along D2, i.e., there are, in total 3 such memory instances.
4. "All" or {(1, 0), (0, 1)}: the memory serves all array dimensions, both D1 and D2, meaning the memory is not unrolled with each operational unit but serves all of them, i.e., there are, in total 1 such memory instance.

Lastly, the different read/write/read-write ports a memory instance has, are assigned to the different data movements possible in the hierarchy. There are four types of data movements in a memory in the hierarchy: from high (*fh*), to high (*th*), from low (*fl*), to low (*tl*).

- **fh**: from high, meaning the data is provided by the higher level of memory to be **written** to the current level of memory
- **th**: to high, meaning the data is **read** out from the current level of memory to go to the higher level of memory
- **fl**: from low, meaning the data is provided by the lower level of memory to be **written** to the current level of memory
- **tl**: to low, meaning the data is **read** out from the current level of memory to go to the lower level of memory

At the time of writing, these can be manually linked to one of the read/write/read-write ports through the following syntax: `{port_type}_port_{port_number}`, *port_type* being *r*, *w* or *rw* and *port_number* equal to the port number, starting from 1, which allows allocating multiple ports of the same type. Alternatively, these are automatically generated as a default if not provided to the `add_memory()` call.

Internally, the MemoryHierarchy object extends the `NetworkX DiGraph <https://networkx.org/documentation/stable/reference/classes/digraph.html>`_ object, so its methods are available.

Expand Down
2 changes: 1 addition & 1 deletion hardware.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion searchindex.js

Large diffs are not rendered by default.

0 comments on commit bb529bf

Please sign in to comment.