Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORT Models Failing because of the latest fsdp changes on transformers Trainer. #1554

Closed
4 tasks
AdamLouly opened this issue Nov 28, 2023 · 6 comments
Closed
4 tasks
Assignees
Labels
bug Something isn't working

Comments

@AdamLouly
Copy link
Contributor

System Info

optimum from source
transformers from source

Who can help?

@JingyaHuang

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

when trying to run training using ortmodule all models will fail due to latest changes on transformers trainer.
fsdp was removed as an attribute and it included other changes.

I can work on the fix if you guys don't have the bandwith.

@JingyaHuang
We also been getting a lot of this types errors, can we work on some CI pipeline to spot these failures so we can fix them fast?

Thanks.

Expected behavior

AttributeError: 'ORTTrainer' object has no attribute 'fsdp'

@AdamLouly AdamLouly added the bug Something isn't working label Nov 28, 2023
@JingyaHuang JingyaHuang self-assigned this Dec 7, 2023
@JingyaHuang
Copy link
Contributor

Hi @AdamLouly , thanks a lot for reporting it. The contribution will be super helpful!

@JingyaHuang
Copy link
Contributor

The CI for ORTTrainer will be restored again with the PR #1575 .

The current issue is caught here: https://github.com/huggingface/optimum/actions/runs/7142761592/job/19452757722?pr=1575

@prathikr
Copy link
Contributor

@JingyaHuang any updates on this issue? I am also running into it.

@JingyaHuang
Copy link
Contributor

Hi @prathikr , Adam has opened a PR for fixing it #1586 .

@nvijayrania
Copy link

nvijayrania commented Dec 26, 2023

Hi @JingyaHuang ,
Could you please confirm what is the last stable version which I can use with latest transformers(4.36.2) which doesn't have the fsdp issue? Tried 1.16.0, it throws the same error, or is there any plan to release the patch soon?

@JingyaHuang
Copy link
Contributor

Hi @nvijayrania, I just merged the fix contributed by @AdamLouly, ORTTrainer shall work as expected if you build optimum from source. We will plan a patch release soon (but unfortunately not this week as team members are OOO this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants