Skip to content

Commit

Permalink
Update adoc docs/bib
Browse files Browse the repository at this point in the history
  • Loading branch information
jerryz123 committed Nov 12, 2024
1 parent d45889e commit b3decdc
Show file tree
Hide file tree
Showing 5 changed files with 24 additions and 9 deletions.
2 changes: 1 addition & 1 deletion docs/background.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ For instance, complex numbers and image pixel data are conventionally stored in
These instructions can significantly reduce programmer burden, and thus performant RVV implementations should not impose an excess performance overhead from their execution.
Vector code which uses these memory operations to reduce dynamic instruction count should perform no worse than the equivalent code which explicitly transforms the data over many vector instructions.

=== Comparing Short-Vector Units
=== Short-Vector Execution

Saturn's instruction scheduling mechanism differentiates it from the relevant comparable archetypes for data-parallel microarchitectures.
Fundamentally, Saturn relies on efficient dynamic scheduling of short-chime short-vectors, without relying on costly register renaming.
Expand Down
8 changes: 4 additions & 4 deletions docs/execute.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ If there are no structural hazards from non-pipelined functional units or regist
An instruction will depart a sequencer along with the last operation it sequences, eliminating dead time between successive vector instructions.

Notably, the sequencers enact "fire-and-forget" operation issue.
Once an operation is issued by a sequencer, it is guaranteed to be free of further structural or data hazards as it proceeds down the pipelined VFU datapaths.
Once an operation is issued by a sequencer, it is guaranteed to be free of further structural or data hazards as it proceeds down the pipelined functional unit datapaths.
This eliminates the need for costly operand or result queues and obviates back-pressure in the functional unit pipelines.


Expand All @@ -78,8 +78,8 @@ To align with the data order expected in the segment buffers in the VLSU, the se
The execute sequencers (VXSs) sequence all arithmetic operations.
They track up to three register operands, with up to four reads and one write per operation (for a masked FMA).
Each VXS issues to a single vector execution unit (VXU).
A VXU is a collection of vector functional units (VFUs).
The VXSs will stall operation execution if the requested VFU within its VXU is unavailable.
A VXU is a collection of vector functional units.
The VXSs will stall operation execution if the requested functional unit within its VXU is unavailable.


==== Special Sequencer
Expand Down Expand Up @@ -124,7 +124,7 @@ The age filter restricts the pending-read and pending-write vectors to only pend

In some cases, the relative age is unambiguous, so no age filter is needed.
Instructions in the sequencer are inherently older than instructions from the feeding issue queue for that sequencer, so no age filter is needed.
Sequenced operations in the VFUs are inherently the oldest writes to any element group, so no age filter is needed for these either.
Sequenced operations in the functional units are inherently the oldest writes to any element group, so no age filter is needed for these either.

Each sequencer computes the element groups that will be accessed or written to by the next operation to be issued, and determines if a pending older read or write to those element groups would induce a RAW, WAR or WAR hazard.
If there is no data hazard and there is no structural hazard, the operation can be issued, with the sequencer incrementing its internal element index counter, or draining the instruction.
Expand Down
2 changes: 1 addition & 1 deletion docs/frontend.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ We observe that this requirement has minimal impact on most vector codes, as sca

=== Interface to VU and VLSU

The `VectorIssueInst` bundle presented to the VU and VLSU contains the instruction bits, scalar operands, and current `vtype`/`vstart`/`vl` settings for this instruction.
The micro-op presented to the VU and VLSU contains the instruction bits, scalar operands, and current `vtype`/`vstart`/`vl` settings for this instruction.
For memory operations, this bundle also provides the physical page index of the accessed page for this instruction, since the PFC and IFC crack vector memory instructions into single-page accesses.
For segmented instructions where a segment crosses a page, `segstart` and `segend` bits are additionally included in the bundle, to indicate which slice of a segment resides in the current page.

16 changes: 15 additions & 1 deletion docs/refs.bib
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,8 @@ @article{fang2022lem
author = {Fang, Zitao},
year = {2022},
number = {EECS-2022-150},
publisher = {EECS Department, University of California, Berkeley}
publisher = {EECS Department, University of California, Berkeley},
url={https://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-150.pdf}
}

@article{frison2018blasfeo,
Expand Down Expand Up @@ -676,6 +677,19 @@ @phdthesis{SCALE
file = {/Users/tianruiwei/Zotero/storage/M4ALTJ8E/Krashinsky - 2007 - Vector-thread architecture and implementation.pdf}
}

@article{chipyard,
author={Amid, Alon and Biancolin, David and Gonzalez, Abraham and Grubb, Daniel and Karandikar, Sagar and Liew, Harrison and Magyar, Albert and Mao, Howard and Ou, Albert and Pemberton, Nathan and Rigge, Paul and Schmidt, Colin and Wright, John and Zhao, Jerry and Shao, Yakun Sophia and Asanovi\'{c}, Krste and Nikoli\'{c}, Borivoje},
journal={IEEE Micro},
title={Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs},
year={2020},
volume={40},
number={4},
pages={10-21},
doi={10.1109/MM.2020.2996616},
ISSN={1937-4143},
url={https://ieeexplore.ieee.org/document/9099108}
}

@inproceedings{scoreboard,
title = {Parallel Operation in the Control Data 6600},
booktitle = {Proceedings of the {{October}} 27-29, 1964, Fall Joint Computer Conference, Part {{II}}: {{Very}} High Speed Computer Systems},
Expand Down
5 changes: 3 additions & 2 deletions docs/system.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[[system]]
== System Overview

Saturn is implemented using the Chisel hardware description language cite:[chisel_paper], and is intended to integrate into existing Chisel-based cores as part of the open-source Chipyard SoC framework.
Saturn is implemented using the Chisel hardware description language cite:[chisel_paper], and is intended to integrate into existing Chisel-based cores as part of the open-source Chipyard cite:[chipyard] SoC framework.
The generator is a parameterized Scala program that uses the Chisel embedded DSL to generate a synthesizable RTL circuit given a user-defined configuration.

=== Organization
Expand All @@ -22,7 +22,8 @@ The *Vector Load-Store Unit (VLSU)* performs vector address generation and memor
Inflight vector memory instructions are tracked in the vector load-instruction-queue (VLIQ) and store-instruction-queue (VSIQ).
The load/store paths within the VLSU execute independently and communicate with the VU through load-response and store-data ports.

The *Vector Datapath (VU)* contains instruction issue queues (VIQs), vector sequencers (VXS/VLS/VSS), the vector register file (VRF), and the SIMD arithmetic functional units (VEUs/VFUs).
The *Vector Datapath (VU)* contains instruction issue queues (VIQs), vector sequencers (VXS/VLS/VSS), and the vector register file (VRF), and the SIMD arithmetic functional units.
The functional units (VFUs) are arranged in execution unit clusters (VEUs), where each VEU is fed by one sequencer.
The sequencers schedule register read/write and issue operations into the VEUs, while interlocking on structural and data hazards.
The VU is organized as a unified structure with a SIMD datapath, instead of distributing the VRF and VEUs across vector lanes.
This approach is better suited for compact designs, where scalability to ultra-wide datapaths is less of a concern.
Expand Down

0 comments on commit b3decdc

Please sign in to comment.