-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New option called "best"
for args.save_strategy
.
#31817
New option called "best"
for args.save_strategy
.
#31817
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Gentle ping @muellerzr @SunMarc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this may seem hacky to you, I think this perfectly expands our given system. Any chance you could add a few tests to this? (Over in test_trainer.py
). Very nice work!
Thanks! And sure, I think I'll have some time to work on it over the weekend. |
8c9c069
to
367c1ff
Compare
@muellerzr (cc. @SunMarc) Hello. I've implemented a new test case inside of The first test case is when a value for I've changed the DefaultFlowCallback object so that if Please let me know if there are any problems or additional changes that I should make! After inspecting the tests it seems like I wasn't supposed to manually alter the training arguments. The most recent two commits undid those. |
@muellerzr @SunMarc No rush, but just a gentle nudge/reminder! Thanks. |
@muellerzr Hi. Just wondering if this PR is still relevant or not. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder @seanswyi ! I've added a suggestion. Please also fix the conflits and we should be good to merge !
629b1a1
to
a852399
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a small nit!
b336efc
to
a74d68c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating !
1. Logic to determine the best logic was separated out from `_save_checkpoint`. 2. In `_maybe_log_save_evaluate`, whether or not a new best metric was achieved is determined after each evaluation, and if the save strategy is "best' then the TrainerControl is updated accordingly.
Same as IntervalStrategy, but with a new attribute called BEST.
`save_strategy` previously followed `IntervalStrategy` but now follows `SaveStrategy`. Changes were made accordingly to the code and the docstring.
1. Checks for both cases where `metric_for_best_model` is explicitly provided and when it's not provided. 2. The first case should have two checkpoints saved, whereas the second should have three saved.
The Trainer saves a checkpoints at the end of training by default as long as `save_strategy != SaveStrategy.NO`. This condition was modified to include `SaveStrategy.BEST` because it would be counterintuitive that we'd only want the best checkpoint to be saved but the last one is as well.
Co-authored-by: Arthur <[email protected]>
Changes were made for consistency and also to fix a potential bug.
df8721a
to
38216fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks for iterating!
@ArthurZucker @SunMarc @muellerzr Sorry to keep pinging you guys on this, but I'm noticing that the tests seem to be possibly stuck after the merge. Could anybody check this when they have the chance? Thanks. https://github.com/huggingface/transformers/runs/32165247051 |
Indeed, thanks for noticing @seanswyi but don't worry, I check the most recent CI and the test is passing. Can you cancel the run @ArthurZucker ? Or it will just cancel by itself after a few days ? |
Ah, got it. Thanks for following up! |
* Add _determine_best_metric and new saving logic. 1. Logic to determine the best logic was separated out from `_save_checkpoint`. 2. In `_maybe_log_save_evaluate`, whether or not a new best metric was achieved is determined after each evaluation, and if the save strategy is "best' then the TrainerControl is updated accordingly. * Added SaveStrategy. Same as IntervalStrategy, but with a new attribute called BEST. * IntervalStrategy -> SaveStrategy * IntervalStratgy -> SaveStrategy for save_strat. * Interval -> Save in docstring. * Updated docstring for save_strategy. * Added SaveStrategy and made according changes. `save_strategy` previously followed `IntervalStrategy` but now follows `SaveStrategy`. Changes were made accordingly to the code and the docstring. * Changes from `make fixup`. * Removed redundant metrics argument. * Added new test_save_best_checkpoint test. 1. Checks for both cases where `metric_for_best_model` is explicitly provided and when it's not provided. 2. The first case should have two checkpoints saved, whereas the second should have three saved. * Changed should_training_end saving logic. The Trainer saves a checkpoints at the end of training by default as long as `save_strategy != SaveStrategy.NO`. This condition was modified to include `SaveStrategy.BEST` because it would be counterintuitive that we'd only want the best checkpoint to be saved but the last one is as well. * `args.metric_for_best_model` default to loss. * Undo metric_for_best_model update. * Remove checking metric_for_best_model. * Added test cases for loss and no metric. * Added error for metric and changed default best_metric. * Removed unused import. * `new_best_metric` -> `is_new_best_metric` Co-authored-by: Arthur <[email protected]> * Applied `is_new_best_metric` to all. Changes were made for consistency and also to fix a potential bug. --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: Zach Mueller <[email protected]>
* Add _determine_best_metric and new saving logic. 1. Logic to determine the best logic was separated out from `_save_checkpoint`. 2. In `_maybe_log_save_evaluate`, whether or not a new best metric was achieved is determined after each evaluation, and if the save strategy is "best' then the TrainerControl is updated accordingly. * Added SaveStrategy. Same as IntervalStrategy, but with a new attribute called BEST. * IntervalStrategy -> SaveStrategy * IntervalStratgy -> SaveStrategy for save_strat. * Interval -> Save in docstring. * Updated docstring for save_strategy. * Added SaveStrategy and made according changes. `save_strategy` previously followed `IntervalStrategy` but now follows `SaveStrategy`. Changes were made accordingly to the code and the docstring. * Changes from `make fixup`. * Removed redundant metrics argument. * Added new test_save_best_checkpoint test. 1. Checks for both cases where `metric_for_best_model` is explicitly provided and when it's not provided. 2. The first case should have two checkpoints saved, whereas the second should have three saved. * Changed should_training_end saving logic. The Trainer saves a checkpoints at the end of training by default as long as `save_strategy != SaveStrategy.NO`. This condition was modified to include `SaveStrategy.BEST` because it would be counterintuitive that we'd only want the best checkpoint to be saved but the last one is as well. * `args.metric_for_best_model` default to loss. * Undo metric_for_best_model update. * Remove checking metric_for_best_model. * Added test cases for loss and no metric. * Added error for metric and changed default best_metric. * Removed unused import. * `new_best_metric` -> `is_new_best_metric` Co-authored-by: Arthur <[email protected]> * Applied `is_new_best_metric` to all. Changes were made for consistency and also to fix a potential bug. --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: Zach Mueller <[email protected]>
@@ -419,6 +419,12 @@ def __init__( | |||
raise ValueError( | |||
f"You have set `args.eval_strategy` to {args.eval_strategy} but you didn't pass an `eval_dataset` to `Trainer`. Either set `args.eval_strategy` to `no` or pass an `eval_dataset`. " | |||
) | |||
if args.save_strategy == SaveStrategy.BEST or args.load_best_model_at_end: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seanswyi Q: does this change mean that this logic:
... Will default to "loss" if unspecified and load_best_model_at_end=True (to use the evaluation loss).
is broken now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested it yet, but it seems like it is. The docs probably need an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shcheklein Yeah, I don't think that there's any case where metric_for_best_model
defaults to loss.
If |
This solves #35070, cheers! 🥳 |
What does this PR do?
Addresses #31626.
Adds a new option called
"best"
forTrainingArguments.save_strategy
which saves the model checkpoint each time a new best performance is achieved.Details
_save_checkpoint
method was in charge of not only saving the model checkpoint but also determining the best metric and best checkpoint. The logic for determining a new best metric was separated out into the_determine_best_metric
method._determine_best_metric
is called after every evaluation inside of_maybe_log_save_evaluate
. The return valuenew_best_metric
is used to determine whether or not a new best metric has been achieved, and if the save strategy is"best"
then the TrainerControl'sshould_save
flag is switched on.best_metric
does not seem to be tracked by default. Rather, it's only tracked whenargs.metric_for_best_model
is provided. I believe that a best metric of some sort should always be tracked, and therefore if a value is not provided then the validation loss is used to determine a new best.SaveStrategy
was created intrainer_utils
that adds a new attribute calledBEST
to the previousIntervalStrategy
.I'm not sure if I like the rather "hack-y" way that I implemented this by manually switching the TrainerControl's
should_save
flag rather than delegating it to the callback handler like the other flags are dealt with. The problem is that the flags are normally updated before calling_maybe_log_save_evaluate
inside of the inner training loop, which means there's no way for us to determine whether or not a new best metric has been achieved with the current logic. I'm not sure if I'm making sense, but I'm open to any other suggestions.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@muellerzr @SunMarc