Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix _save_checkpoint for online methods #2288

Merged
merged 5 commits into from
Oct 31, 2024
Merged

Conversation

qgallouedec
Copy link
Member

@qgallouedec qgallouedec commented Oct 28, 2024

What does this PR do?

The change in huggingface/transformers#31817 is breaking for TRL since it removes metrics from _save_checkpoint.

We need to use the new style, using the new method Trainer._determine_best_metric. We need to copy this method from Trainer to ensure compatibly.

Compatibility

TRL \ Transformers v4.46 dev
v0.11 not affected No
dev Yes, we copy-paste the new method Trainer._determine_best_metric into OnlineDPOTrainer Yes

Note that once we've set the minimum transformers version to version > 4.46, we will be able to remove OnlineDPOTrainer._determine_best_metric.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec qgallouedec changed the title Update trainer_utils import and save strategy in online_dpo_trainer.py Fix _save_checkpoint for online methods Oct 30, 2024
@@ -30,7 +30,7 @@ jobs:
python -m pip install --upgrade pip
# install PEFT & transformers from source
pip install -U git+https://github.com/huggingface/peft.git
pip install -U git+https://github.com/huggingface/transformers.git
pip install -U git+https://github.com/huggingface/transformers.git@f339042b0b8bdc0b57a70d37f67cafbea960a2ab
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another recent breaking change. I set this commit to make the CI run anyway. I will fix the new breaking change in a followup PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR. But do you think we should add a CI workflow that tests against our lowest supported version of transformers? This may have picked up the issue with the examples being broken on main when processing_class was added?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. For ref, #2298 aims to make tests clearer/cleaner. I'll extend the work with another PR that implements your suggestion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pip doesn't have an option to prefer the min version yet pypa/pip#8085
Let's do it manually

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another recent breaking change. I set this commit to make the CI run anyway. I will fix the new breaking change in a followup PR

This other breaking change is solved here: #2302

Copy link
Member Author

@qgallouedec qgallouedec Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a CI workflow that tests against our lowest supported version of transformers

Done in #2303

@qgallouedec qgallouedec requested a review from lewtun October 30, 2024 15:53
@qgallouedec
Copy link
Member Author

If at least one of you @muellerzr @SunMarc can take a look please 🙏

Copy link
Contributor

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Sorry that's a bit breaking (but also I suppose somewhat understandable since it's _save_checkpoint). Fix makes sense to me to deprecate on your own :)

@qgallouedec qgallouedec mentioned this pull request Oct 31, 2024
5 tasks
Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@qgallouedec qgallouedec merged commit bb56c6e into main Oct 31, 2024
10 of 12 checks passed
@qgallouedec qgallouedec deleted the fix-save-checkpoint branch October 31, 2024 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants