Rodinia benchmarks for CUDA translated to DPC++ using the Intel DPC++ Compatibility Tool. These benchmarks can run in CPU, and NVIDIA/Intel GPUs.
Before to run the benchmarks you have to install some dependencies.
- CUDA Toolkit 11.4 → Mandatory to run in NVIDIA GPUs.
- Intel LLVM open source compiler → It will run the code on all the devices.
- Intel oneAPI Base Toolkit. The commercial "dpcpp" compiler cohabits with the previous compiler. However, it does not support NVIDIA GPUs.
- In order to run the "Hybridsort" test, you have to install some OpenGL libraries:
sudo apt-get install freeglut3 freeglut3-dev sudo apt-get install binutils-goldc sudo apt-get install libglew1.5
Once you installed all the requirements, you have to edit the file "common/make.config", changing the value of the following variables:
- CUDA_DIR → set the path where you have installed the CUDA Toolkit (e.g. /usr/local/cuda)
- LLVM_DIR → set the location where you have installed the LLVM compiler (e.g. ~/sycl_workspace/llvm/build)
- ONEAPI_DIR → set the location where you have installed the oneAPI Base Toolkit (e.g. /opt/intel/oneapi)
At this point, there are two Makefiles to build the benchmarks, one of them for CUDA benchmarks and another for DPC++ benchmarks.
Move to cuda folder and invoke the make command with the following arguments:
- time=<0,1> → Prints the time consumption of the device.
Example:
cd cuda
make time=1
Move to dpcpp folder and invoke the make command with the following arguments:
- time=<0,1> → Prints the time consumption of the device.
- DPCPP_ENV=<clang,oneapi> → The "clang" option uses the LLVM compiler, while the "oneapi" uses the oneAPI compiler.
- DEVICE=<CPU,INTEL_GPU,NVIDIA_GPU> → Selects the device where the code runs. The "NVIDIA_GPU" option just works selecting the variable "DPCPP_ENV=clang".
The following example compiles the benchmarks using the LLVM compiler, selects the NVIDIA GPU, and choose to show the GPU time consumption:
cd dpcpp
make DPCPP_ENV=clang DEVICE=NVIDIA_GPU time=1
You can run them one by one, or use the scripts we provide ("time_cuda.sh", "time_dpcpp.sh"), which save the kernel time in a "timing" folder. For that, you had to compile them with the "time=1" argument.
The following benchmarks does not work in DPC++:
- Hybridsort
- Kmeans
- Leukocyte
- Mummergpu
- Germán Castaño, Youssef Faqir-Rhazoui, Carlos García and Manuel Prieto-Matías (2022). "Evaluation of Intel’s DPC++ Compatibility Tool in Heterogeneous Computing." Journal of Parallel and Distributed Computing, volume 165, pages 120-129.
This work has been supported by the EU (FEDER) and the Spanish MINECO and CM under grants S2018/TCS-4423 and RTI2018-093684-B-I00.