Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FSDP config for CPU RAM efficient loading through accelerate #30002

Merged

Conversation

helloworld1
Copy link
Contributor

@helloworld1 helloworld1 commented Apr 2, 2024

What does this PR do?

Currently the environment variable FSDP_CPU_RAM_EFFICIENT_LOADING is being read here but are set in transformers codebase. This change added option to set FSDP_CPU_RAM_EFFICIENT_LOADING through cpu_ram_efficient_loading FSDP option so jobs launched from torchrun or other means can take advantage of FSDP_CPU_RAM_EFFICIENT_LOADING through configs.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@pacman100
@muellerzr
@ArthurZucker

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@helloworld1 helloworld1 force-pushed the helloworld1/cpu_ram_efficient_loading branch from 68033fa to 8719a0f Compare April 4, 2024 20:52
@helloworld1 helloworld1 force-pushed the helloworld1/cpu_ram_efficient_loading branch 2 times, most recently from cf40b24 to 5443e5b Compare April 15, 2024 16:38
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker ArthurZucker requested a review from muellerzr April 18, 2024 09:17
Copy link
Contributor

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Overall this looks much better to me, just a slight documentation nit. cc @pacman100 so you're aware

src/transformers/training_args.py Outdated Show resolved Hide resolved
@helloworld1
Copy link
Contributor Author

Thanks for the suggestion. @muellerzr would you re-approve?

@muellerzr muellerzr requested a review from amyeroberts April 18, 2024 16:15
@muellerzr
Copy link
Contributor

No need for a reapproval in those cases :) The green checkmark in transformers is end-all-be-all approval unless something radically changed outside that. cc @amyeroberts for final review

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding!

Just a small nit and request for input validation

src/transformers/training_args.py Outdated Show resolved Hide resolved
src/transformers/training_args.py Outdated Show resolved Hide resolved
Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @helloworld1 for adding this!

@helloworld1 helloworld1 force-pushed the helloworld1/cpu_ram_efficient_loading branch from e7a1c8f to e81cd98 Compare April 19, 2024 04:52
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding and iterating - looks great!

src/transformers/training_args.py Outdated Show resolved Hide resolved
@amyeroberts
Copy link
Collaborator

Running make fixup should resolve the code quality checks

@helloworld1 helloworld1 force-pushed the helloworld1/cpu_ram_efficient_loading branch from 73d8e4a to a2978fb Compare April 19, 2024 18:50
@amyeroberts amyeroberts merged commit f16caf4 into huggingface:main Apr 22, 2024
21 checks passed
zafstojano pushed a commit to zafstojano/transformers that referenced this pull request Apr 22, 2024
…gingface#30002)

* Add FSDP config for CPU RAM efficient loading

* Style fix

* Update src/transformers/training_args.py

Co-authored-by: Zach Mueller <[email protected]>

* Update src/transformers/training_args.py

Co-authored-by: amyeroberts <[email protected]>

* Add sync_module_states and cpu_ram_efficient_loading validation logic

* Update src/transformers/training_args.py

Co-authored-by: amyeroberts <[email protected]>

* Style

---------

Co-authored-by: Zach Mueller <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
itazap pushed a commit that referenced this pull request May 14, 2024
)

* Add FSDP config for CPU RAM efficient loading

* Style fix

* Update src/transformers/training_args.py

Co-authored-by: Zach Mueller <[email protected]>

* Update src/transformers/training_args.py

Co-authored-by: amyeroberts <[email protected]>

* Add sync_module_states and cpu_ram_efficient_loading validation logic

* Update src/transformers/training_args.py

Co-authored-by: amyeroberts <[email protected]>

* Style

---------

Co-authored-by: Zach Mueller <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants