You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in order to support the Hybrid FSDP use case where we use hivemind to do decentralized training between nodes running fsdp we need to be able to send pytorch DTensor. (At least would work with FSDP2, FSDP1 is slightly more complicated).
I see two way of supporting it :
Either we have one hivemind worker per node, which would call DTensor.full_weight() and then send it
or
Each local rank would have its hivemind worker and send and receive the DTensor.to_local()
The text was updated successfully, but these errors were encountered:
in order to support the Hybrid FSDP use case where we use hivemind to do decentralized training between nodes running fsdp we need to be able to send pytorch DTensor. (At least would work with FSDP2, FSDP1 is slightly more complicated).
I see two way of supporting it :
or
The text was updated successfully, but these errors were encountered: