Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dpo_trainer] Fixed a compatibility bug with deepspeed when initializing reference_model #1123

Closed
wants to merge 1 commit into from

Conversation

Emperorizzis
Copy link

When training a model using DeepSpeed + DPO (with Warmup), the following error occurs:
(This issue has also been mentioned in #955).

File "xxx/deepspeed/runtime/lr_schedules.py", line 661, in __init__ self.warmup_num_steps = max(2, warmup_num_steps) TypeError: '>' not supported between instances of 'str' and 'int'

The dpo_trainer initializes both the model and the reference_model using the same deepspeed config. Training hyperparameters are set through TrainingArguments & Trainer, but the reference_model only requires deepspeed for forward computation. Scheduler-related parameters not only cause issues but are also redundant, they should be removed.

@kashif
Copy link
Collaborator

kashif commented Dec 21, 2023

nice catch @Emperorizzis checking!

@kashif kashif added the 🏋 DPO Related to DPO label Dec 21, 2023
@kashif
Copy link
Collaborator

kashif commented Dec 21, 2023

@Emperorizzis can you also share your deepspeed config to make sure I can reproduce it? thanks!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Emperorizzis
Copy link
Author

@Emperorizzis can you also share your deepspeed config to make sure I can reproduce it? thanks!

Of Course ~

Below is my deepspeed config:

{
  "bfloat16": {
      "enabled": true
  },
  "fp16": {
      "enabled": false,
      "loss_scale": 0,
      "loss_scale_window": 1000,
      "initial_scale_power": 16,
      "hysteresis": 2,
      "min_loss_scale": 1
  },
  "optimizer": {
      "type": "AdamW",
      "params": {
          "lr": "auto",
          "weight_decay": "auto",
          "betas": "auto",
          "eps": "auto",
          "torch_adam": true,
          "adam_w_mode": true
      }
  },
  "scheduler": {
      "type": "WarmupDecayLR",
      "params": {
          "warmup_min_lr": 1e-7,
          "warmup_max_lr": 1e-6,
          "warmup_num_steps": "auto",
          "total_num_steps": "auto"
      }
  },
  "zero_optimization": {
      "stage": 3,
      "overlap_comm": true,
      "contiguous_gradients": true,
      "sub_group_size": 1e12,
      "reduce_bucket_size": "auto",
      "stage3_prefetch_bucket_size": "auto",
      "stage3_param_persistence_threshold": "auto",
      "stage3_max_live_parameters": 1e9,
      "stage3_max_reuse_distance": 1e9,
      "stage3_gather_16bit_weights_on_model_save": true
  },
  "gradient_accumulation_steps": "auto",
  "gradient_clipping": "auto",
  "steps_per_print": 1e5,
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "wall_clock_breakdown": false
}

Also, to save memory, I loaded the model and reference_model after initializing the TrainingArguments (which automatically selected the deepspeed backend).
(I noticed that the dpo.py script in the example loads the model before initializing the TrainingArguments, which can run out of memory when the model is very large, such as 70B or larger.)

@kashif
Copy link
Collaborator

kashif commented Dec 22, 2023

thanks @Emperorizzis so one issue is that the Deepspeed 3 together with pre-computing the ref log-probs does not work as the private HF Trainer methods initialize the dataloader before the model etc. and we currently raise an error:

https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L358-L36

@Emperorizzis
Copy link
Author

thanks @Emperorizzis so one issue is that the Deepspeed 3 together with pre-computing the ref log-probs does not work as the private HF Trainer methods initialize the dataloader before the model etc. and we currently raise an error:

https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L358-L36

Thanks for replying ~ So the reference_model does need to be initialized from different deepspeed configs. (At least remove the superfluous data config)

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@github-actions github-actions bot closed this Jan 28, 2024
@yiranma0
Copy link

yiranma0 commented Mar 4, 2024

So does this issue remain unsolved in the latest version 0.7.11? I met the same error using the example script dpo.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏋 DPO Related to DPO
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants