Skip to content

Commit

Permalink
Update flash attention documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
DavidAfonsoValente committed Jan 30, 2024
1 parent 712191d commit 237c1a3
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/en/perf_infer_gpu_one.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ FlashAttention-2 is currently supported for the following architectures:
* [Phi](https://huggingface.co/docs/transformers/model_doc/phi#transformers.PhiModel)
* [Qwen2](https://huggingface.co/docs/transformers/model_doc/qwen2#transformers.Qwen2Model)
* [Whisper](https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperModel)
* [XLMRoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta#transformers.XLMRobertaModel)
* [xlm_roberta](https://huggingface.co/docs/transformers/model_doc/xlm-roberta#transformers.XLMRobertaModel)

You can request to add FlashAttention-2 support for another model by opening a GitHub Issue or Pull Request.

Expand Down

0 comments on commit 237c1a3

Please sign in to comment.