You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RDMA Scatter/Gather is a nice way to consolidate data transfers. For example, verbs API allows data at multiple locations to be written in a remote buffer with a SINGLE RDMA write operation; or, data in a remote buffer could be read to multiple locations with a SINGLE RDMA read operation. This is attractive, but seems that nobody has reported benefits they get from this feature.
One possible reason is the limited RNIC SRAM, which may cause these 2 problems:
Remote and local HCA cannot store much data, which limit the SGE_NUM. But recent RNIC has much larger SRAM compared with earlier generations, Mellanox CX4 RNIC has about 2MB SRAM, which is big enough for us because node feature in GNN is only about 1-4KB.
Storing too much data in HCA may lead to little memory budget for MTT/MPT which may cause severe MTT shoot down and too many PCIe/DMA overheads for address translation. But this can be solved if we use large physical contiguous memory (using Linux CMA ) and physical memory region which is a new feature in Mellanox CX5.
The text was updated successfully, but these errors were encountered:
max_rd_atomic is a crucial QP attribute for performance, it is the number of RDMA Reads & atomic operations outstanding at any time that can be handled by a RC QP as an initiator. Well, for me, I still cannot understand why setting this attribute larger than 1 helps us a lot. We need to find out the reasons behind.
eedalong
changed the title
关于RDMA的Scatter-Gather特性的验证分析以及MAX_RD_ATOMIC参数的分析
About RDMA Scatter/ Gather & RC QP's max_rd_atomic
May 26, 2022
RDMA Scatter/Gather is a nice way to consolidate data transfers. For example, verbs API allows data at multiple locations to be written in a remote buffer with a SINGLE RDMA write operation; or, data in a remote buffer could be read to multiple locations with a SINGLE RDMA read operation. This is attractive, but seems that nobody has reported benefits they get from this feature.
One possible reason is the limited RNIC SRAM, which may cause these 2 problems:
Remote and local HCA cannot store much data, which limit the SGE_NUM. But recent RNIC has much larger SRAM compared with earlier generations, Mellanox CX4 RNIC has about 2MB SRAM, which is big enough for us because node feature in GNN is only about 1-4KB.
Storing too much data in HCA may lead to little memory budget for MTT/MPT which may cause severe MTT shoot down and too many PCIe/DMA overheads for address translation. But this can be solved if we use large physical contiguous memory (using Linux CMA ) and physical memory region which is a new feature in Mellanox CX5.
The text was updated successfully, but these errors were encountered: