RDMA put() throughput low below 1MB level #103

TylerADavis · 2017-07-20T22:00:29Z

At the 1MB level, the RDMA code sees a throughput of ~2.5 GB/s for puts. However, at lower levels, the performance is worse.
128: ~1 MB/s
4K: 37 MB/s
50k: 400 MB/s
1MB: ~2.5 GB/s

For gets:
128: ~1 MB/s
4k: ~37 MB/s
50k: ~375 MB/s
1MB: ~2 GB/s

This performance should be improved.

TylerADavis · 2017-07-20T22:03:24Z

Currently the RDMA code calls ibv_reg_mr to register the memory that every sent object resides in. This may be a bottleneck in the code, and would also affect the lower message sizes more than larger messages (as it has a per message cost).

TylerADavis · 2017-07-20T22:27:36Z

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.189.1285&rep=rep1&type=pdf Figure 8 shows how up to a certain point the cost of registration is higher than that of copying, which explains why some of the RDMA results are worse than TCP and some are better than TCP.

TylerADavis · 2017-07-20T23:19:22Z

@jcarreira I believe that the following may be an effective way to improve RDMA throughput, what do you think?:

Change RDMA code to adopt the new interface in Speed improvements #90 (also in the high level design document). This means that the serializer will be able to write into a specific buffer.
Add a queue of preregistered RDMA memory regions to the RDMAClient. These memory regions will only be used for send operations.
If a mr is available, simply pop it from the queue, serialize into it, and use it to send. This avoids the cost of registering a new mr
If no mr available, allocate a new buffer and call ibv_reg_mr on it, then serialize into it.
Update the RDMAOpInfo struct so that it holds a pointer to the buffer used for this write (if a write) as well as to the RDMAMem corresponding to the send mr.
When an operation is complete, the MR will be added to the queue if the queue's size is under some threshold. Otherwise, the mr is deregistered and the underlying memory is freed.

It may be possible to do something similar to optimize the read() methods, but it will likely be more complex.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDMA put() throughput low below 1MB level #103

RDMA put() throughput low below 1MB level #103

TylerADavis commented Jul 20, 2017 •

edited

Loading

TylerADavis commented Jul 20, 2017

TylerADavis commented Jul 20, 2017

TylerADavis commented Jul 20, 2017

RDMA put() throughput low below 1MB level #103

RDMA put() throughput low below 1MB level #103

Comments

TylerADavis commented Jul 20, 2017 • edited Loading

TylerADavis commented Jul 20, 2017

TylerADavis commented Jul 20, 2017

TylerADavis commented Jul 20, 2017

TylerADavis commented Jul 20, 2017 •

edited

Loading