You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When I read the paper, I notice that during the training, the hardware utilization is improved at the cost of stale gradients. I am wondering if we could avoid stale gradients at the cost of hardware utilization when using hivemind. can we force the worker to process the next batch until he finish get the first batch result?
The text was updated successfully, but these errors were encountered:
Describe the bug
When I read the paper, I notice that during the training, the hardware utilization is improved at the cost of stale gradients. I am wondering if we could avoid stale gradients at the cost of hardware utilization when using hivemind. can we force the worker to process the next batch until he finish get the first batch result?
The text was updated successfully, but these errors were encountered: