Sparse matrix vector product benchmark (in comparison with spark and dask) #68

pcmoritz · 2016-05-24T01:46:23Z

The numbers on the 3Mx3M test matrix from https://snap.stanford.edu/data/com-Orkut.html look like this:

scipy.sparse single threaded: 1.2s
halo (1 node, 4 workers): 600ms
halo (2 nodes, 4 workers each): 430ms
dask (4 workers): 1.0s
dask.distributed (1 node, 4 workers): 11s
dask.distributed (2 nodes, 4 workers each): 8.1s

Distributed Dask presumably does not perform well, because it does not have an object store where the sparse matrix blocks can be stored. The single node version of dask does not need to perform serialization, but is limited by the Python GIL.

For pyspark, the full matrix gave a serialization error; using a 2Mx2M matrix gives:

scipy.sparse single threaded: 0.76s
spark (1 node, 4 workers): 1.41s
spark (2 node, 4 workers each): 1.56s

Before this is merged, we should check with the author of Dask that there is not a more efficient way to implement these operations.

…and dask

ludwigschmidt · 2016-05-25T02:24:57Z

Nice numbers! I assume each node has at least four hardware threads?

Is it clear why the speed-up is roughly 3x in total instead of 8x (total number of cores)?

And I guess halo is the new name for Orchestra? :-)

cathywu · 2016-05-25T02:28:48Z

Also curious. :)

On Tue, May 24, 2016 at 7:25 PM Ludwig Schmidt [email protected]
wrote:

Nice numbers! I assume each node has at least four hardware threads?

Is it clear why the speed-up is roughly 3x in total instead of 8x (total
number of cores)?

And I guess halo is the new name for Orchestra? :-)

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#68 (comment)

pcmoritz · 2016-05-25T21:02:28Z

Hey Cathy + Ludwig,

glad to hear from you! Scaling up sparse linear algebra on non-MPI systems is challenging, because each task is typically very small (in this case, it takes on the order of a few ms).

This was the first experiment where we got a speedup for sparse linear algebra on multiple nodes using Halo. Now that we understand better where the bottlenecks are (mainly the synchronous gRPC calls to the scheduler), we are going to address them in the next development iteration.

Best,
Philipp.

add sparse matrix vector product benchmark and comparison with spark …

a223f0b

…and dask

pcmoritz added a commit to pcmoritz/ray that referenced this pull request Dec 18, 2017

add errno logging for fatal checks (ray-project#68)

a2692ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse matrix vector product benchmark (in comparison with spark and dask) #68

Sparse matrix vector product benchmark (in comparison with spark and dask) #68

pcmoritz commented May 24, 2016 •

edited

Loading

ludwigschmidt commented May 25, 2016

cathywu commented May 25, 2016

pcmoritz commented May 25, 2016

Sparse matrix vector product benchmark (in comparison with spark and dask) #68

Are you sure you want to change the base?

Sparse matrix vector product benchmark (in comparison with spark and dask) #68

Conversation

pcmoritz commented May 24, 2016 • edited Loading

ludwigschmidt commented May 25, 2016

cathywu commented May 25, 2016

pcmoritz commented May 25, 2016

pcmoritz commented May 24, 2016 •

edited

Loading