English | 日本語
This lab demonstrates how to modify your code to optimize the hardware-software system generated by the SDx IDE using task-level pipelining. You can observe the impact of pipelining on performance.
📌 NOTE You can complete this tutorial even if you do not have a ZC702 board. When creating the SDSoC environment project, select your board. The tutorial instructions ask you to add source files created for an application created for the ZC702. If your board contains a smaller Zynq-7000 device, after adding source files you need to edit the file
mmult_accel.cpp
to reduce resource usage (in the accelerator source file you will see#pragma_HLS_array_partition
which setsblock factor=16
; instead, setblock factor=8
).
Task Pipelining
If there are multiple calls to an accelerator in your application, then you can structure your application such that you can pipeline these calls and overlap the setup and data transfer with the accelerator computation. In the case of the matrix multiply application, the following events take place:
- Matrices A and B are transferred from the main memory to accelerator local memories.
- The accelerator executes.
- The result, C, is transferred back from the accelerator to the main memory.
The following figure illustrates the matrix multiply design on the left side and on the right side a time-chart of these events for two successive calls that are executing sequentially.
The following figure shows the two calls executing in a pipelined fashion. The data transfer for the second call starts as soon as the data transfer for the first call is finished and overlaps with the execution of the first call. To enable the pipelining, however, we need to provide extra local memory to store the second set of arguments while the accelerator is computing with the first set of arguments. The SDSoC environment generates these memories, called multi-buffers, under the guidance of the user.
Specifying task level pipelining requires rewriting the calling code using the pragmas async(id) and wait(id). The SDSoC environment includes an example that demonstrates the use of async pragmas and this Matrix Multiply Pipelined example is used in this tutorial.
Task Pipelining in the Matrix Multiply Example
The SDx IDE includes a matrix multiply pipelined example that demonstrates the use of async pragmas to implement task-level pipelining. This exercise allows you to see the runtime improvement that comes from using this technique.
-
Create a new SDx project (
lab5
) by selecting File > New > SDx Project. Enter the project namelab5
, select the ZC702 Platform and Linux System Configuration, and click Next. -
The Templates page appears, containing source code examples for the selected platform. From the list of application templates, select Empty Application and click Finish.
-
Using your operating system file manager, navigate to
<path to install>/SDx/2018.2/samples/mmult_pipelined
and copy the source files in that directory (mmult_accel.cpp
,mmult_accel.h
, andmmult.cpp
) into thesrc
folder of the newly created project (for example./lab5/src
). -
Click on lab5 in SDx and from the context menu select Refresh. This adds all the copied sources in the previous step to the project.
-
Change the build configuration to Release.
-
Mark the function
mmult_accel
in the filemmult_accel.cpp
for hardware using the Add HW Functions icon in the SDx Project Settings or Toggle HW/SW in the Project Explorer. -
Build the project.
-
Copy the files obtained in the
sd_card
folder to an SD card, set up a terminal and run the generated application on the board. You need to specify the pipeline depth as an argument to the application. Run the application with pipeline depth of 1, 2, and 3 and note the performance obtained.
After completing this tutorial, you should be able to do the following:
- Use the SDx IDE to optimize your application to reduce runtime by performing task-level pipelining.
- Observe the impact on performance of pipeline calls to an accelerator when overlapping accelerator computation with input and output communication.
Copyright© 2019 Xilinx