Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDPO checkpoint ú· #2505

Open
5 of 9 tasks
nguyenhoa-uit opened this issue Dec 20, 2024 · 3 comments
Open
5 of 9 tasks

DDPO checkpoint ú· #2505

nguyenhoa-uit opened this issue Dec 20, 2024 · 3 comments
Labels
🐛 bug Something isn't working 🏋 DPPO Related to DDPO 🙋 help from community wanted Open invitation for community members to contribute ⏳ needs more info Additional information or clarification is required to proceed

Comments

@nguyenhoa-uit
Copy link

System Info

Colab Pro usage

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

I trained a model with 500 epochs with DDPO and pushed to huggingface completely, I save some checkpoints, and save locally too.
However, when I wanted to train additional 100 epochs to that saved model, I could not do in several ways.
First, I using the following code, but impossible because of the wrapper error (quite like when we infer direcly).
pipeline = DefaultDDPOStableDiffusionPipeline(
"my-finetuned-model",
)
Then, I used load_lora_weights, I can train without any changes, the trained model was like base model.
Finally, I using checkpoint, I can run without any changes.
Please give me some advice about using checkpoint or saved models.
Thanks.

Expected behavior

Please give me some advice about using checkpoint or saved models to finetune some more epochs

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete
@qgallouedec qgallouedec added 🐛 bug Something isn't working ⏳ needs more info Additional information or clarification is required to proceed 🏋 DPPO Related to DDPO 🙋 help from community wanted Open invitation for community members to contribute labels Dec 20, 2024
@metric-space
Copy link
Contributor

@nguyenhoa-uit I can help out with this as this was code I wrote more than a year ago. Mind you, I'll be very very slow. Let me take a look

@metric-space
Copy link
Contributor

@nguyenhoa-uit
Copy link
Author

@nguyenhoa-uit could you try this bit : https://github.com/huggingface/trl/blob/main/trl/trainer/ddpo_config.py#L64 ?
When I used checkpoint resume from in config file, I ran and had a bug at https://github.com/huggingface/trl/blob/main/trl/trainer/ddpo_trainer.py#L541C20-L541C42
When I passed with try catch, it didnot use the parameters from this checkpoint but base model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working 🏋 DPPO Related to DDPO 🙋 help from community wanted Open invitation for community members to contribute ⏳ needs more info Additional information or clarification is required to proceed
Projects
None yet
Development

No branches or pull requests

3 participants