You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please take a look at our publications that contain experiments for different algorithms implemented in hivemind: https://github.com/learning-at-home/hivemind#citation (take a look at the newest papers in "Additional publications" too). Hope it helps!
Hivemind has several components that have different scaling properties.
For instance, hivemind.dht.DHT scales into 8192 nodes more or less seamlessly - and can probably larger if we had the RAM (and patience) to test it.
In turn, hivemind.Optimizer requires some tweaking to go beyond 256 nodes - different averaging timeouts and/or groups. The only time (to my knowledge) we tested it with more than 1k nodes it required multiple averaging groups as in this paper.
As for hivemind.moe, it's scaling properties depend on the network design. Having a model with multiple smaller MoE layers scales to more nodes than one big MoE. Having 2d grid scales better than 1d grid. I'd hazard a guess that a single MoE layer can scale into thousands of nodes with some tinkering (grid, beam search paams), but i haven't ever done that.
Hello,
I am researching P2P solutions and am wondering how well Hivemind scales?
Thanks
The text was updated successfully, but these errors were encountered: