Flash Attention for fine-tuning #4

prince14322 · 2023-09-04T08:28:40Z

How can we use flash attention v2 for fine-tuning with huggingface models?

Does the path only works for pre-training(or extended pre-training)?

Link

All the discussions mentioned below are for pre-training(or extended pre-training).

I would like to fine-tune bigcode/starcoder 15.5 billion parameter model with 2k context length using A100-80GB.

The text was updated successfully, but these errors were encountered:

cmosguy · 2023-10-29T18:40:57Z

@prince14322 and @pacman100

I am getting the error:

ValueError: The current architecture does not support Flash Attention 2.0. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new
[2023-10-29 08:38:36,863] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0

What is going with why @pacman100 was able to run the code back in August, but now it is broken when we install the latest transformers from the git main line? What is the solution to fixing this?

susnato · 2023-11-02T18:26:03Z

Hi @prince14322 , Flash Attention is now added to StarCoder. PR
Please install from transformers main and and you will be able to use it for starcoder.

younesbelkada · 2023-11-02T19:38:07Z

Great work adding it in transformers @susnato ! That PR is part of the newest release (4.35.0) you can just do pip install -U transformers

pacman100 · 2023-11-03T06:43:04Z

Hello, we have updated the code to use the FA2 support from the 🤗 transformers instead of the monkey patching. The Transformers support should handle packing as well as non packing scenarios resolving this issue.

pacman100 added the solved label Nov 3, 2023

pacman100 closed this as completed Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash Attention for fine-tuning #4

Flash Attention for fine-tuning #4

prince14322 commented Sep 4, 2023

cmosguy commented Oct 29, 2023

susnato commented Nov 2, 2023

younesbelkada commented Nov 2, 2023

pacman100 commented Nov 3, 2023

Flash Attention for fine-tuning #4

Flash Attention for fine-tuning #4

Comments

prince14322 commented Sep 4, 2023

cmosguy commented Oct 29, 2023

susnato commented Nov 2, 2023

younesbelkada commented Nov 2, 2023

pacman100 commented Nov 3, 2023