Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix is_fsdp_enabled() logic and make low_cpu_mem_usage by default #29961

Conversation

helloworld1
Copy link
Contributor

@helloworld1 helloworld1 commented Mar 30, 2024

What does this PR do?

is_fsdp_enabled() logic is incorrectly using "FSDP_CPU_RAM_EFFICIENT_LOADING" envvar, which is only set by accelerate launcher not in the TrainingArguments. This PR removes the condition and only look for "ACCELERATE_USE_FSDP" to determine if fsdp enabled or not.

The "FSDP_CPU_RAM_EFFICIENT_LOADING" now only deciding if low_cpu_mem_usage is used or not.

This fixed the issue that is_fsdp_enabled() is returning False if using torchrun to launch the script making low_cpu_mem_usage unusable.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker @pacman100

@ArthurZucker ArthurZucker requested a review from muellerzr March 30, 2024 19:27
@helloworld1 helloworld1 force-pushed the helloworld1/low_cpu_memory_fix branch from 63a83b2 to ef76f46 Compare April 1, 2024 23:04
Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @helloworld1, Thank you for the proposed change. However, low_cpu_mem_usage can't be default as there were models such as Whisper and few others that were resulting in errors with this being default.

@helloworld1
Copy link
Contributor Author

helloworld1 commented Apr 2, 2024

Hello @helloworld1, Thank you for the proposed change. However, low_cpu_mem_usage can't be default as there were models such as Whisper and few others that were resulting in errors with this being default.

Got it. Thanks for pointing out! @pacman100 Do you think it is a good idea to make low_cpu_mem_usage configurable through fsdp_config?

Currently environment variable FSDP_CPU_RAM_EFFICIENT_LOADING is being read but not set in transformers library. I can add a low_cpu_mem_usage config to set the environment variable so it will apply the low mem.

@helloworld1
Copy link
Contributor Author

I proposed this PR: #30002 to add the FSDP config.

@helloworld1
Copy link
Contributor Author

@pacman100 @muellerzr I am closing this PR in favor of this PR #30002 to make low cpu mem usage option configurable. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants