diff --git a/src/transformers/training_args.py b/src/transformers/training_args.py index 8ef4f3c54a0a60..35df2342dfc982 100644 --- a/src/transformers/training_args.py +++ b/src/transformers/training_args.py @@ -483,8 +483,8 @@ class TrainingArguments: If `"True"`, each individually wrapped FSDP unit will broadcast module parameters from rank 0 to ensure they are the same across all ranks after initialization - cpu_ram_efficient_loading (`bool`, *optional*, defaults to `False`) - If `"True"`, only the first process loads the pretrained model checkpoint while all other processes - have empty weights. When this setting is True, `sync_module_states` also must to be True, + If `True`, only the first process loads the pretrained model checkpoint while all other processes + have empty weights. When this setting is `True`, `sync_module_states` also must to be `True`, otherwise all the processes except the main process would have random weights leading to unexpected behaviour during training. - activation_checkpointing (`bool`, *optional*, defaults to `False`):