[BUG] stale gradients #514

elricwan · 2022-10-18T21:27:34Z

Describe the bug
When I read the paper, I notice that during the training, the hardware utilization is improved at the cost of stale gradients. I am wondering if we could avoid stale gradients at the cost of hardware utilization when using hivemind. can we force the worker to process the next batch until he finish get the first batch result?

elricwan added the bug Something isn't working label Oct 18, 2022

elricwan assigned justheuristic Oct 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] stale gradients #514

[BUG] stale gradients #514

elricwan commented Oct 18, 2022

[BUG] stale gradients #514

[BUG] stale gradients #514

Comments

elricwan commented Oct 18, 2022