-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flash Attention for fine-tuning #4
Comments
I am getting the error: ValueError: The current architecture does not support Flash Attention 2.0. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new
[2023-10-29 08:38:36,863] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 What is going with why @pacman100 was able to run the code back in August, but now it is broken when we install the latest |
Hi @prince14322 , Flash Attention is now added to |
Great work adding it in transformers @susnato ! That PR is part of the newest release (4.35.0) you can just do |
Hello, we have updated the code to use the FA2 support from the 🤗 transformers instead of the monkey patching. The Transformers support should handle packing as well as non packing scenarios resolving this issue. |
How can we use flash attention v2 for fine-tuning with huggingface models?
Does the path only works for pre-training(or extended pre-training)?
All the discussions mentioned below are for pre-training(or extended pre-training).
Unable to train "bigcode/starcoder" model on 80 A100-80GB GPUs using FSDP huggingface/accelerate#1864
Incorrectness in Flash Attention #1
I would like to fine-tune
bigcode/starcoder
15.5 billion parameter model with 2k context length using A100-80GB.The text was updated successfully, but these errors were encountered: