please add university courses and informative videos
- Parallel Reduction [Slides]
- GPU Memory bootcamp - Tony Scudiero [git repo]
- CUB: CUDA Collective primitives library [Git] [Slides] [Video]
- Best Practices Guide by PRACE [HTML] [PDF]
- vtsynergy (https://github.com/vtsynergy)
- This was shown to work on DAS5 after copying /usr/include/limits.h to $PWD and commenting out the lines around # include_next (122-125) :
"cu2cl-tool host_code.cc device_code.cu -- -DGPU_ON -I$PWD:/usr/include -I/usr/lib/gcc/x86_64-redhat-linux/4.8.2/include".
- This was shown to work on DAS5 after copying /usr/include/limits.h to $PWD and commenting out the lines around # include_next (122-125) :
- cutocl (https://github.com/benvanwerkhoven/cutocl)
- OpenCL-based libraries
- CUDA-based libraries
- Unit Testing
- Example of a unit test for CUDA kernel using the Kernel Tuner
- comparing floating-point results
-
Resources:
- Better Performance at Lower Occupancy [Slides] [Video]
- Maxwell Tuning Guide
- Pascal Tuning Guide
-
Generic Auto Tuners:
- Kernel Tuner (Python)
- CLTune (C++)