Block matrices in BLAS methods #224

ndryden · 2017-03-21T19:29:05Z

It looks like several of the BLAS methods currently only support element-wise matrices as arguments, and do not support block-wise matrices.

Functions I particularly care about are:

Hadamard
Dot (which implies support in HilbertSchmidt)
ColumnTwoNorms
ColumnMaxNorms

I haven't looked, but I suspect there are more that don't support it that I'm not using right now.

The text was updated successfully, but these errors were encountered:

poulson · 2017-03-24T05:12:25Z

Many of these don't exist for legacy reasons (when they were originally written, El::DistMatrix<T,colDist,rowDist,El::BLOCK> did not yet exist), but each should only require a few lines of code to add support for.

Might I ask why you are preferring the blocked distribution? It was primarily meant for implementing the distributed Hessenberg QR algorithm and for legacy interfaces to ScaLAPACK.

ndryden · 2017-03-24T11:15:31Z

Yes, it didn't look too hard to make the changes. I'll try and put together a pull request adding support for these methods.

The use case is for better distributed GEMM performance in code where the matrix multiplies are a major bottleneck (training neural networks).

poulson · 2017-03-24T14:31:00Z

But, why use block distributions for the distributed GEMM? The element-wise distributions will have better performance in Elemental. There are a few legitimate reasons to demand block distributions (e.g., for bulge-chasing algorithms like the Hessenberg QR algorithm), but element-wise distributions are much simpler and block distributions are not quite first-class in the library (as you are finding).

ndryden · 2017-03-24T14:59:57Z

I benchmarked this some time ago and found the opposite conclusion-- that block matrices ended up quite a bit faster, and didn't think it unusual (since it agrees with what I recall from theory). However, I just rewrote and ran a quick benchmark and the results agree with you: the element-wise distribution is faster. (Seems to be about 60-75% faster.) Which is confusing, but perhaps there was an issue with the prior benchmark or how I ran it.

Is the performance difference due to something inherent in the element-wise distribution, or just Elemental's distributed GEMM implementations?

poulson · 2017-03-24T15:08:14Z

Despite popular opinion, there is nothing about element-wise distributions that effects the performance of GEMM relative to a block distribution, as each process stores its portion of matrices locally and can make use of BLAS 3. Further, nothing would prevent an implicit permutation of the rows and columns so that one could run the same algorithm as for the blocked case on an implicit permutation (though this doesn't hold for factorizations because, for example, triangular structure is not preserved under such permutations).

The only change in communication pattern to usual blocked algorithms is the usage of MPI_Allgather rather than MPI_Bcast, but their performance is essentially the same.

The current implementation of El::Gemm for distributed dense matrices uses El::ReadProxy and El::ReadWriteProxy to redistribute into an El::DistMatrix<T,El::MC,El::MR> distribution, so if you pass in block distributed matrices, there are four extra communication steps (A, B, and C into elemental form, and C back into block form). This should account for the performance differences you are seeing (and, as a result, as the matrix dimensions go to infinity, you should see the relative performance converge since these are quadratic costs in a cubic cost algorithm).

ndryden · 2017-03-24T15:32:00Z

Okay, that's good to know, and explains quite well the performance I'm seeing with my latest benchmark. (The performance difference between block and elemental distributions is larger when comparing a GEMM on 2^14 x 2^14 square matrices than with 2^15 x 2^15 square matrices.)

I already have code to support block distributions in the operations I mentioned above, and I don't think there's any sense in not contributing it. I'll make a pull request after I have another pair of eyes look over it for any issues.

poulson · 2017-03-24T15:33:15Z

Great, I'll be looking forward to the PR!

ndryden mentioned this issue Mar 24, 2017

Support block matrices in several BLAS1 routines #226

Merged

ndryden mentioned this issue Jun 20, 2017

Evaluate block based Elemental matrix distributions versus normal elemental distributions LLNL/lbann#30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block matrices in BLAS methods #224

Block matrices in BLAS methods #224

ndryden commented Mar 21, 2017

poulson commented Mar 24, 2017

ndryden commented Mar 24, 2017

poulson commented Mar 24, 2017

ndryden commented Mar 24, 2017

poulson commented Mar 24, 2017 •

edited

Loading

ndryden commented Mar 24, 2017

poulson commented Mar 24, 2017

Block matrices in BLAS methods #224

Block matrices in BLAS methods #224

Comments

ndryden commented Mar 21, 2017

poulson commented Mar 24, 2017

ndryden commented Mar 24, 2017

poulson commented Mar 24, 2017

ndryden commented Mar 24, 2017

poulson commented Mar 24, 2017 • edited Loading

ndryden commented Mar 24, 2017

poulson commented Mar 24, 2017

poulson commented Mar 24, 2017 •

edited

Loading