Skip to content

Commit

Permalink
Update monkeypatch to put barrier in optim load (mosaicml#2874)
Browse files Browse the repository at this point in the history
* wip

* bugfix

* increase retries and jitter

* logs

* logs

* remove kadabra

* add sync

* remove

* no sync

* logs

* tweak

* strip print

* strip

* upload file

* remove comment

* remove

---------

Co-authored-by: Abhinav Venigalla <[email protected]>
  • Loading branch information
mvpatel2000 and abhi-mosaic authored Jan 17, 2024
1 parent 2fd6c77 commit e1728f6
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion composer/trainer/mosaic_fsdp_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1163,7 +1163,6 @@ def _shard_orig_param_state(
optim_state,
pg=fsdp_state.process_group,
device=fsdp_state.compute_device,
cpu_offload=True,
)
if not shard_param_info.in_shard:
return {}
Expand All @@ -1179,6 +1178,7 @@ def _shard_orig_param_state(
):
value = value.flatten()[intra_param_start_idx : intra_param_end_idx + 1].clone() # type: ignore[operator]
new_optim_state[state_name] = value
torch.cuda.synchronize()
return new_optim_state

def fsdp_state_has_default_pg(state: '_FSDPState') -> bool:
Expand Down

0 comments on commit e1728f6

Please sign in to comment.