-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[integration] Update Ray Tune integration for Ray 2.7 #26499
[integration] Update Ray Tune integration for Ray 2.7 #26499
Conversation
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
if "keep_checkpoints_num" in kwargs and kwargs["keep_checkpoints_num"] > 0: | ||
# `keep_checkpoints_num=0` would disabled checkpointing | ||
trainer.use_tune_checkpoints = True | ||
if kwargs["keep_checkpoints_num"] > 1: | ||
logger.warning( | ||
f"Currently keeping {kwargs['keep_checkpoints_num']} checkpoints for each trial. " | ||
"Checkpoints are usually huge, " | ||
"consider setting `keep_checkpoints_num=1`." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, you needed to set a # of checkpoints to keep in order to do checkpointing at all -- now this is just an optional arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we confirm that restoration works correctly? Are there tests for this?
with checkpoint.as_directory() as checkpoint_dir: | ||
for subdir in os.listdir(checkpoint_dir): | ||
if subdir.startswith(PREFIX_CHECKPOINT_DIR): | ||
checkpoint_path = os.path.join(checkpoint_dir, subdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the checkpoint_dir
get removed after getting out of scope?
I think we need checkpoint.to_directory()
here instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed and tested restoration with an updated test (in the ray repo which I'll link)
…to update_ray_integration
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
…to update_ray_integration
Signed-off-by: Justin Yu <[email protected]>
Tagging @muellerzr @pacman100 -- who can help get this PR in? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix 🤗 ! I guess TrainerHyperParameterRayIntegrationTest
is the test that was broken in transformers. Looks good to me, pinging @pacman100 and @muellerzr to have a second look!
metrics = metrics.copy() | ||
self.objective = self.compute_objective(metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No sure I understand this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is because the metrics get modified later with some extra keys, so I want to keep the original one in tact: https://github.com/huggingface/transformers/pull/26499/files/9fbd0ec915455293eb4a1730e6a473d5c11b1151#diff-ed55888e6665791fe92cc8fc0c499da54f4ace6738551cd9a2591881cda076deR1199
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LG2M, agree though that perhaps leave it as a one-liner instead of splitting it into 2 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @justinvyu for updating ray tune integration! 🚀✨
@justinvyu FYI seems like there's still some errors/issues with this: #27598 |
Ah thanks for confirming the fix with the user, I'll take a look at that. |
@muellerzr I took a look at the user issue, and it's actually something in the example that should be changed. The limitation of the integration is that non-serializable objects like (metrics) cannot be supplied to the HF trainer. This PR is good to merge though. |
@justinvyu a quick |
Signed-off-by: Justin Yu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Failures look to be unrelated? cc @ArthurZucker to make sure
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes are unrelated indeed feel free to rebase and I'll merge then!
The test is also failing on main, merging 😉 |
What does this PR do?
Ray 2.7 introduced some backwards-incompatible API changes, which broke the HuggingFace transformers integration with Ray Tune
trainer.hyperparameter_search(backend="ray")
. This PR fixes the integration to use the new APIs. Note that this means that the next transformers version will no longer supportray<2.7
.I have manually tested this with the example in the Ray repo here: ray-project/ray#40125
This will be regularly tested in CI once this PR is in.
ray-project/ray#39763
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@richardliaw