-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Did BytePS Support multiple NICs now? #408
Comments
You can use UCX van and enable multi-rail. @eric-haibin-lin Do you have a brief instruction? |
@eric-haibin-lin Could you please share the document of multi NICs? |
@bobzhuyb By the way, why did you choice UCX to implement multi NICs? |
Because UCX has native support for multi-NIC, and it's from Mellanox (RDMA NIC vendor), and it's part of NVIDIA now.. |
Thanks a lot. Looking forward to your shared the document of multi NICs. |
Hi @wuyujiji , you need to update the ps-lite commit to the latest one (49e4582), which contains several important fixes for UCXVan. There's a small patch for byteps to use the latest ps-lite, @pleasantrabbit can share/PR the patch. In order to use multiple NICs, you will need to build ucx with rdma support (https://github.com/bytedance/ps-lite#build) and specify env var For serious performance benchmarks, i suggest you run ucxvan performance test first in your cluster. You can run this test https://github.com/bytedance/ps-lite#3-other-benchmarks to get an idea of pushpull speed with multiple NICs if you upgrade to the latest UCXVan. |
@eric-haibin-lin @pleasantrabbit Thanks for your wonderful works! I will test it in my cluster. By the way, did you test and compare the performance (push pull speed and end-to-end model) between single NIC and multi NICs? |
@bobzhuyb Hello,I ask a detailed question! The multi NICs operation process of BytePS is whether to allocate each 4M tensor to a different NIC, or to split a 4M tensor into sub-tensor to allocate to different NICs? |
We give the 4M tensor to UCX. Then UCX will split it onto multiple NICs by itself. |
If I remember correctly with one ps-lite worker per node, it reaches about 300 Gb/s with two 200 Gb/s NICs. In the internal version, we create multiple ps-lite worker instances per node to increase the goodput. |
@bobzhuyb @eric-haibin-lin OK, thanks for your reply! |
Hello! Did BytePS implement multiple NICs internally?
The text was updated successfully, but these errors were encountered: