DDPO checkpoint ú· #2505

nguyenhoa-uit · 2024-12-20T05:50:18Z

System Info

Colab Pro usage

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

I trained a model with 500 epochs with DDPO and pushed to huggingface completely, I save some checkpoints, and save locally too.
However, when I wanted to train additional 100 epochs to that saved model, I could not do in several ways.
First, I using the following code, but impossible because of the wrapper error (quite like when we infer direcly).
pipeline = DefaultDDPOStableDiffusionPipeline(
"my-finetuned-model",
)
Then, I used load_lora_weights, I can train without any changes, the trained model was like base model.
Finally, I using checkpoint, I can run without any changes.
Please give me some advice about using checkpoint or saved models.
Thanks.

Expected behavior

Please give me some advice about using checkpoint or saved models to finetune some more epochs

Checklist

I have checked that my issue isn't already filed (see open issues)
I have included my system information
Any code provided is minimal, complete, and reproducible (more on MREs)
Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
Any traceback provided is complete

The text was updated successfully, but these errors were encountered:

metric-space · 2024-12-21T21:33:11Z

@nguyenhoa-uit I can help out with this as this was code I wrote more than a year ago. Mind you, I'll be very very slow. Let me take a look

metric-space · 2024-12-23T09:46:31Z

@nguyenhoa-uit could you try this bit : https://github.com/huggingface/trl/blob/main/trl/trainer/ddpo_config.py#L64 ?

nguyenhoa-uit · 2024-12-25T02:18:37Z

@nguyenhoa-uit could you try this bit : https://github.com/huggingface/trl/blob/main/trl/trainer/ddpo_config.py#L64 ?
When I used checkpoint resume from in config file, I ran and had a bug at https://github.com/huggingface/trl/blob/main/trl/trainer/ddpo_trainer.py#L541C20-L541C42
When I passed with try catch, it didnot use the parameters from this checkpoint but base model.

qgallouedec added 🐛 bug Something isn't working ⏳ needs more info Additional information or clarification is required to proceed 🏋 DPPO Related to DDPO 🙋 help from community wanted Open invitation for community members to contribute labels Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDPO checkpoint ú· #2505

DDPO checkpoint ú· #2505

nguyenhoa-uit commented Dec 20, 2024

metric-space commented Dec 21, 2024

metric-space commented Dec 23, 2024

nguyenhoa-uit commented Dec 25, 2024

DDPO checkpoint ú· #2505

DDPO checkpoint ú· #2505

Comments

nguyenhoa-uit commented Dec 20, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

Checklist

metric-space commented Dec 21, 2024

metric-space commented Dec 23, 2024

nguyenhoa-uit commented Dec 25, 2024