-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce configured_state arg for accelerator_config #29781
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding use_configured_state
to control whether or not to reset the AcceleratorState
when creating TrainingArguments
object!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
Could we run a check with a hyperparam search (just most common settings) to check whether this does cause an effect?
cc @amyeroberts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this!
I have a question about the state specification, but I suspect this more to do with my knowledge of the use of PartialState
in accelerate.
Have you experimented at all with hyperparam searches to know if this does affect things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! Looks good to me.
I'd like a second review and approval from @pacman100 before merging, as he's more familiar with the state handling so can catch anything I might have missed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @muellerzr for adding this!
51cd21c
to
47fafa1
Compare
@amyeroberts requesting a rereview after having to gut much of it on further testing 😅 |
@pacman100 Can you do a re-review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @muellerzr, for iterating on this. Left a couple comments.
@@ -1615,6 +1619,39 @@ def __post_init__(self): | |||
if version.parse(version.parse(torch.__version__).base_version) == version.parse("2.0.0") and self.fp16: | |||
raise ValueError("--optim adamw_torch_fused with --fp16 requires PyTorch>2.0") | |||
|
|||
# We need to setup the accelerator config here *before* the first call to `self.device` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, makes sense as the PartialState
is usually created in the first call to self.device
itself.
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
063fdd3
to
257e47a
Compare
"`AcceleratorState` or `PartialState` to be defined before calling `TrainingArguments`. " | ||
) | ||
# We rely on `PartialState` to yell if there's issues here (which it will) | ||
self.distributed_state = PartialState(cpu=self.use_cpu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this still doesn't account for the case when user passed --fsdp
but hasn't enabled it via PartialState
. In general, my comment was about the mismatch between the training arguments vs the PartialState set by them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we just have a PartialState
you can still initialize FSDP later as it's done at the AcceleratorState
level. I'll do a quick test to verify, but the PartialState
doesn't care about FSDP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(PartialState only initializes the distributed env, for things like FSDP or DeepSpeed plugin that happens later, though DS will still be called/activated if the env is set properly with the PartialState)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, Thank you Zach for the clarifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Zach, I still think this has some gaps. Please refer to my comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this feature!
* Introduce configured_state * Include note on tuning * Allow for users to have defined a state already * Include tests * Add note on hpam tune * Guard a bit better * Update src/transformers/training_args.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/training_args.py Co-authored-by: amyeroberts <[email protected]> * Finish rebase * Finish rebase * Guard carefully * Fixup test * Refactor * Fin refactor * Comment * Update wrt feedback --------- Co-authored-by: amyeroberts <[email protected]>
) * Introduce configured_state * Include note on tuning * Allow for users to have defined a state already * Include tests * Add note on hpam tune * Guard a bit better * Update src/transformers/training_args.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/training_args.py Co-authored-by: amyeroberts <[email protected]> * Finish rebase * Finish rebase * Guard carefully * Fixup test * Refactor * Fin refactor * Comment * Update wrt feedback --------- Co-authored-by: amyeroberts <[email protected]>
What does this PR do?
This PR starts to allow users to bring in their own
Accelerator
. To start, we are simply allowing users to define aPartialState
orAcceleratorState
outside theTrainingArguments
, and then enable the user to use its state using a newuse_configured_state
arg.For instance, a user can now do:
And this will use the defined
state
already made.These
states
are Singeltons, so defining it once and calling it will maintain the same state on subsequent calls.This may lead to issues with hyperparameter tuning, which requires the state to be reset each time, as noted in the doc description
Fixes related accelerate issue which occurs when defining an
Accelerator
before theTrainingArguments
: huggingface/accelerate#2564Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@amyeroberts @pacman100