Skip to content

Matrix-vector multiplication using several approaches

Notifications You must be signed in to change notification settings

brunoabreuphd/mvmul

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mvmul

This is a series of applications that perform Matrix-vector multiplication using different libraries and compilers. The goal is to have a measure of how much the final results fluctuate from build to build. For that reason, the matrix, the vector and the resulting product vector are calculated using basic Fortran data types, without calling any libraries (via straightforward implementation of loops), and the results are stored into HDF5 files, that can then be opened and read by different applications without precision loss. Makefiles and run scripts are provided, but keep in mind that they have environment variables that are set to the machines that I am using. You will need to update them.

  • Input matrix (mat): M x N
  • Input vector (vec): N x O
  • Output vector (prod): M x O

Naturally, O = 1 for a legit matrix-vector operation.

This implements the matrix-vector multiplication using loops that enclose the direct definition of the operation. All of the numbers are chosen to be 64-bits floating points. The input matrix and vector elements are selected from a uniform distribution on the unitary interval using Fortran's intrinsic rand, therefore having an expected value of 0.5 (in a probabilistic sense).

These three data structures (matrix and two vectors) are then stored into HDF5 datasets, enclosed in a single file. You will need to know where the HDF5 library is installed in your system to make this work (or install it, which is not too hard).

I have intentionally and explicitly set the vector dimensions, so that a generalization to matrix-matrix multiplication is easily achieved. If you are interested in performance, I also included a stopwatch around the mvmul operation.

With M = N = 1000, this generates an HDF5 of about 8 MB. Notice that this is quite modest if you are testing performance. However, it quickly scales up! Adding a a couple of zeros to M and N will bring your memory (RAM) requirements to about 80 GB, and this is not parallelized!

This is a Fortran module to open and read the HDF5 files. The subroutine read_mvmul_data in that module will look for a mvmul.h5 in the current directory, and return the input matrix and vector, and the output vector. If you are running these tests across multiple builds, I recommend creating a single mvmul.h5 by running the plain application and then creating soft links to that file in the directories where you are working.

This uses the module implemented in read_h5 to read the HDF5 file generated by plain. The input matrix and vector are then multiplied using Intel MKL's DGEMV, and the results are compared element-by-element.

This uses the module implemented in read_h5 to read the HDF5 file generated by plain. The input matrix and vector are then multiplied using BLIS's DGEMV, and the results are compared element-by-element.

The idea here is obviously the same: use the module implemented in read_h5 to read the arrays stored in an HDF5 file and generated by plain, and then re-calculate the matrix-vector multiplication using calls to cuBLAS matrix-vector multiplication routine, which then runs on an NVIDIA GPU. There are a few ways to do that if we want to write host Fortran code, detailed below. One could include cudaEvents enclosing the execution of CUDA kernels to keep track of performance. However, using nvprof gives a good amount of detail that includes time spent withdata transfers, so I recommend using that instead.

This uses CUDA Fortran to explicitly allocate arrays in host and device, perform data transfers between them, and execute the cuBLAS call in the device.

This uses OpenACC directives to coordinate the data movement between host and device and execute the cuBLAS call in the device.

About

Matrix-vector multiplication using several approaches

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published