Enforce saving at end of training if saving option chosen #30160

muellerzr · 2024-04-10T12:40:42Z

What does this PR do?

Currently we save at the end of training if an epoch strategy is chosen, but not if a steps strategy is chosen. This can be undesireable because the end of a users training might not be saved at all, sometimes upwards of 10% as seen here: #28539

This PR mimics the behavior of epoch, by choosing to save the model at the end of training as well if steps is chosen.

I do not believe it makes sense to make this configurable, because a user should always have access to the end-resulting model when training!

Fixes #28539

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@amyeroberts @pacman100

muellerzr · 2024-04-10T12:54:32Z

We can certainly make this configurable as an arg, but I'm not sure it makes sense to do this

HuggingFaceDocBuilderDev · 2024-04-10T13:14:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts

Thanks for working on this! Agreed I think always saving on the last step is a good idea.

Just some questions about the tests

tests/trainer/test_trainer.py

pacman100

Thank you, @muellerzr, for enabling the model to be saved at the end of training when using --save_steps strategy. Left a nit.

src/transformers/trainer_callback.py

muellerzr · 2024-04-29T17:30:10Z

@amyeroberts good for a rereview, no functional changes happened just switched out the test logic to be more obvious what's going on

amyeroberts

Thanks for adding!

* Enforce saving at end of training * Fix test * Rework test * Fixup tests' * Update comment based on sourab feedback * Clean

…e#30160) * Enforce saving at end of training * Fix test * Rework test * Fixup tests' * Update comment based on sourab feedback * Clean

muellerzr added the trainer label Apr 10, 2024

muellerzr requested review from pacman100 and amyeroberts April 10, 2024 12:40

muellerzr mentioned this pull request Apr 10, 2024

load_best_model_at_end is inconsistent with evaluation (and save) logic at end of training #28539

Closed

4 tasks

amyeroberts reviewed Apr 18, 2024

View reviewed changes

tests/trainer/test_trainer.py Show resolved Hide resolved

tests/trainer/test_trainer.py Show resolved Hide resolved

tests/trainer/test_trainer.py Outdated Show resolved Hide resolved

pacman100 reviewed Apr 29, 2024

View reviewed changes

src/transformers/trainer_callback.py Outdated Show resolved Hide resolved

muellerzr requested a review from amyeroberts April 29, 2024 17:29

muellerzr force-pushed the muellerzr-load-best-model branch from cc684e5 to 9bbb268 Compare April 29, 2024 17:58

muellerzr added 5 commits May 15, 2024 14:47

Enforce saving at end of training

17b141a

Fix test

57dbe85

Rework test

b73f1a4

Fixup tests'

370e6cf

Update comment based on sourab feedback

4002cb8

muellerzr force-pushed the muellerzr-load-best-model branch from ef951dc to 4002cb8 Compare May 15, 2024 18:47

Clean

95b0183

muellerzr requested a review from pacman100 May 15, 2024 18:53

pacman100 approved these changes May 16, 2024

View reviewed changes

amyeroberts approved these changes May 20, 2024

View reviewed changes

muellerzr merged commit daf281f into main May 21, 2024
21 checks passed

muellerzr deleted the muellerzr-load-best-model branch May 21, 2024 11:50

itazap pushed a commit that referenced this pull request May 21, 2024

Enforce saving at end of training if saving option chosen (#30160)

f553214

* Enforce saving at end of training * Fix test * Rework test * Fixup tests' * Update comment based on sourab feedback * Clean

itazap pushed a commit that referenced this pull request May 21, 2024

Enforce saving at end of training if saving option chosen (#30160)

08f6cee

* Enforce saving at end of training * Fix test * Rework test * Fixup tests' * Update comment based on sourab feedback * Clean

itazap pushed a commit that referenced this pull request May 22, 2024

Enforce saving at end of training if saving option chosen (#30160)

b480f62

* Enforce saving at end of training * Fix test * Rework test * Fixup tests' * Update comment based on sourab feedback * Clean

itazap pushed a commit that referenced this pull request May 24, 2024

Enforce saving at end of training if saving option chosen (#30160)

4cf1249

* Enforce saving at end of training * Fix test * Rework test * Fixup tests' * Update comment based on sourab feedback * Clean

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce saving at end of training if saving option chosen #30160

Enforce saving at end of training if saving option chosen #30160

muellerzr commented Apr 10, 2024

muellerzr commented Apr 10, 2024

HuggingFaceDocBuilderDev commented Apr 10, 2024

amyeroberts left a comment

pacman100 left a comment

muellerzr commented Apr 29, 2024

amyeroberts left a comment

Enforce saving at end of training if saving option chosen #30160

Enforce saving at end of training if saving option chosen #30160

Conversation

muellerzr commented Apr 10, 2024

What does this PR do?

Before submitting

Who can review?

muellerzr commented Apr 10, 2024

HuggingFaceDocBuilderDev commented Apr 10, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

pacman100 left a comment

Choose a reason for hiding this comment

muellerzr commented Apr 29, 2024

amyeroberts left a comment

Choose a reason for hiding this comment