-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiling blas3.cu with clang++ #245
Comments
The way to fix this with nvcc is to specify the correct arch (e.g. Btw: The fast BLAS3 kernels in ViennaCL are in the OpenCL backend. I haven't backported them to the CUDA backend yet. |
That worked !! thanks ! |
But, aren't these kernels part of the CUDA backend ? |
Yes, they are. But these are not as fast as the kernels generated by the OpenCL backend. |
I modified the blas3.cpp file to default to using OPENCL_MEMORY when VIENNACL_WITH_OPENCL was being used and added a new "blas3-ocl" target to compile the example for OpenCL,
For matrices of size 128x16384 (A) and 16384x128 (B), the opencl code runs slower than CUDA code, System setup: Can the slowdown be attributed to the NVIDIA GPU ? |
I'd like to compile blas3.cu with clang++ (yeah !! clang++ can compile CUDA) instead of nvcc to compare the performance of the prod kernels produced. I've built clang and llvm from sources on the release50 branch of each repository and tried building the the program with,
clang++ -DVIENNACL_WITH_CUDA -I/home/seabed/Software/viennacl-dev -I/usr/local/cuda/include ../examples/tutorial/blas3.cu -o examples/tutorial/blas3-clang-cuda -L/usr/local/cuda/lib64 -lcudart_static -ldl -lrt -lpthread -lboost_chrono -lboost_date_time -lboost_serialization -lboost_system -lboost_thread -lboost_atomic -lpthread -O3 -Xcuda-ptxas "-O3 -m64 -fmad true"
The command failed with the __shfl_xor as an undefined intrinsic,
The error persists even after including the header files that declare and define the intrinsic,
Please suggest corrections for this strategy.
The text was updated successfully, but these errors were encountered: