- Split the memory management (
CudaMatrix
) from the CUBLAS invocation (CudaPipeline
) - Moved all the allocation to the smart pointers inside
CudaMatrix
- Removed unused headers
- Smart pointers to handle cuda resources
- New CudaMatrix class
- Use Eigen::MatrixXd
- Check available memory in the GPU before computing
- Template class, implementation only for double available
- Triple tensor product
- Shapes struct
- Tensor matrix multiplacation using gemmbatched.
- Async calls to memory copies.
- Properly free memory after the tensor operation is done.
- Use a template function to perform matrix matrix multiplacation using CUBLAS.
- Use either pinned (default) or pageable memory, see cuda optimizations.